Haoyuan Li | |
---|---|
Alma mater | UC Berkeley (Ph.D.) Cornell University (M.S.) Peking University (B.S.) |
Known for | Alluxio |
Scientific career | |
Fields | Computer Science |
Thesis | Alluxio: A Virtual Distributed File System (2018) |
Doctoral advisor | Ion Stoica Scott Shenker |
Website | haoyuanli |
Haoyuan (H.Y.) Li is a computer scientist and entrepreneur specializing in distributed systems, big data, and cloud computing. He is best known for proposing Virtual Distributed File System (VDFS), [1] and creating an open-source data orchestration system, Alluxio. He is the Founder, Chairman, and CEO of Alluxio, Inc, [2] [3] a company commercializing the Alluxio Data Orchestration Technology. He is also an adjunct professor at Peking University. He is a frequent speaker on the topic of AI, big data, cloud computing, and open source at conferences.
Li was born and raised in China. He attended Peking University, where he received a BS in Computer Science. While at university, he participated in programming contests representing Peking University, and placed 11th worldwide (bronze medal) in ACM ICPC 2005 and 13rd place worldwide in 2006. He then studied at Cornell University, where he received a MS in Computer Science.
He received his Computer Science PhD [1] from the UC Berkeley AMPLab, under the supervision of Prof. Ion Stoica and Prof. Scott Shenker. During his PhD, he co-created the Alluxio (a.k.a. Tachyon) open-source project, [4] which was commercialized by San Francisco Bay Area venture-backed company Alluxio, Inc. [1] [5] [6] [7] [8] [9] He was a co-founder of Alluxio, Inc.
During his PhD, he also co-created the Apache Spark Streaming project [10] and became an Apache Spark committer. [11]
The University of California, Berkeley College of Engineering is the public engineering school of the University of California, Berkeley. Established in 1931, the college occupies fourteen buildings on the northeast side of the main campus and also operates the 150-acre (61-hectare) Richmond Field Station. It is considered to be highly selective and is consistently ranked among the top engineering schools in both the nation and the world.
Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.
Scott J. Shenker is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science Institute in Berkeley, California.
Randy Howard Katz is a distinguished professor emeritus at University of California, Berkeley of the electrical engineering and computer science department.
Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.
Ion Stoica is a Romanian–American computer scientist specializing in distributed systems, cloud computing and computer networking. He is a professor of computer science at the University of California, Berkeley and co-director of AMPLab. He co-founded Conviva and Databricks with other original developers of Apache Spark.
Matei Zaharia is a Romanian-Canadian computer scientist, educator and the creator of Apache Spark.
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.
Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.
Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.
Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.
The ACM SIGOPS Mark Weiser Award is awarded to an individual who has shown creativity and innovation in operating system research. The recipients began their career no earlier than 20 years prior to nomination. The special-interest-group-level award was created in 2001 and is named after Mark Weiser, the father of ubiquitous computing.
Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.
Ali Ghodsi is a Swedish computer scientist and entrepreneur of Persian origin, specializing in distributed systems and big data. He is a co-founder and CEO of Databricks and an adjunct professor at UC Berkeley. He coauthored several influential papers, including Apache Mesos and Apache Spark SQL.
AMPLAB was a University of California, Berkeley lab focused on big data analytics located in Soda Hall. The name stands for the Algorithms, Machines and People Lab. It has been publishing papers since 2008 and was officially launched in 2011. The AMPLab was co-directed by Professor Michael J. Franklin, Michael I. Jordan, and Ion Stoica.
Reynold Xin is a computer scientist and engineer specializing in big data, distributed systems, and cloud computing. He is a co-founder and Chief Architect of Databricks. He is best known for his work on Apache Spark, a leading open-source Big Data project. He was designer and lead developer of the GraphX, Project Tungsten, and Structured Streaming components and he co-designed DataFrames, all of which are part of the core Apache Spark distribution; he also served as the release manager for Spark's 2.0 release.
Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & Professor Ion Stoica. Alluxio sits between computation and storage in the big data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License.
Mosharaf Chowdhury is a Bangladeshi-American computer scientist known for his contributions to the fields of computer networking and large-scale systems for emerging machine learning and big data workloads. He is an Associate Professor of Computer Science and Engineering at the University of Michigan, Ann Arbor and leads SymbioticLab. He is the creator of coflow and the co-creator of Apache Spark.
Dominant resource fairness (DRF) is a rule for fair division. It is particularly useful for dividing computing resources in among users in cloud computing environments, where each user may require a different combination of resources. DRF was presented by Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker and Ion Stoica in 2011.
DBOS is a Database-Oriented Operating System designed to simplify and improve the scalability, security and resilience of large-scale distributed applications. It started in 2020 as a joint open source project with MIT, Stanford and Carnegie Mellon University, after a brainstorm between Michael Stonebraker and Matei Zaharia on how to scale and improve scheduling and performance of millions of Apache Spark tasks.
{{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: Cite journal requires |journal=
(help)