Ali Ghodsi

Ali Ghodsi
Ali Ghodsi
Born	December 1978 (age 45); Iran
Citizenship	Swedish
Alma mater	KTH Royal Institute of Technology (Ph.D.)
Known for	Apache Mesos, Apache Spark, Co-founding Databricks
Title	CEO of Databricks
	Scientific career
Fields	Computer Science
Institutions	Databricks ; UC Berkeley
Thesis	Distributed k-ary System: Algorithms for Distributed Hash Tables (2006)
Doctoral advisor	Seif Haridi
Website	www.cs.berkeley.edu/~alig

Last updated April 13, 2024

Ali Ghodsi (born December 1978)^[3] is an Persian computer scientist and entrepreneur^[4] specializing in distributed systems and big data. He is a co-founder and CEO of Databricks ^[5]^[6]^[7] and an adjunct professor at UC Berkeley. He coauthored several influential papers, including Apache Mesos ^[8] and Apache Spark SQL.^[9]

Ghodsi received his PhD from KTH Royal Institute of Technology in Sweden, advised by Seif Haridi. He was a co-founder of Peerialism AB, a Stockholm-based company developing a peer-to-peer data transfer system. He was also an assistant professor at KTH from 2008 to 2009.

He joined UC Berkeley in 2009 as a visiting scholar and worked with Scott Shenker, Ion Stoica, Michael Franklin, and Matei Zaharia on research projects in distributed systems, database systems, and networking. During this period, he helped start the Apache Mesos and Apache Spark projects. He also co-invented the concept of Dominant resource fairness,^[10] in a paper that heavily influenced resource management and scheduling design in distributed systems such as Hadoop.^[11]

In 2013, he co-founded Databricks, a company that commercializes Spark, and became chief executive in 2016.^[12]^[13]

Related Research Articles

Scott J. Shenker is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science Institute in Berkeley, California.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

Ion Stoica is a Romanian–American computer scientist specializing in distributed systems, cloud computing and computer networking. He is a professor of computer science at the University of California, Berkeley and co-director of AMPLab. He co-founded Conviva and Databricks with other original developers of Apache Spark.

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Matei Zaharia is a Romanian-Canadian computer scientist, educator and the creator of Apache Spark.

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.

Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.

A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases, semi-structured data, unstructured data and binary data. A data lake can be established "on premises" or "in the cloud".

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

Reynold Xin is a computer scientist and engineer specializing in big data, distributed systems, and cloud computing. He is a co-founder and Chief Architect of Databricks. He is best known for his work on Apache Spark, a leading open-source Big Data project. He was designer and lead developer of the GraphX, Project Tungsten, and Structured Streaming components and he co-designed DataFrames, all of which are part of the core Apache Spark distribution; he also served as the release manager for Spark's 2.0 release.

Azure Data Lake is a scalable data storage and analytics service. The service is hosted in Azure, Microsoft's public cloud.

Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & Professor Ion Stoica. Alluxio sits between computation and storage in the big data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License.

Haoyuan (H.Y.) Li is a computer scientist and entrepreneur specializing in distributed systems, big data, and cloud computing. He is best known for proposing Virtual Distributed File System (VDFS), and creating an open-source data orchestration system, Alluxio. He is the Founder, Chairman, and CEO of Alluxio, Inc, a company commercializing the Alluxio Data Orchestration Technology. He is also an adjunct professor at Peking University. He is a frequent speaker on the topic of AI, Big Data, Cloud Computing, and Open Source at conferences.

Dominant resource fairness (DRF) is a rule for fair division. It is particularly useful for dividing computing resources in among users in cloud computing environments, where each user may require a different combination of resources. DRF was presented by Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker and Ion Stoica in 2011.

References

↑ Ali Ghodsi, Databricks CEO: A Fortt Knox Conversation. April 12, 2023 – via YouTube.
↑ "Talks at GS with Ali Ghodsi". Goldman Sachs. July 16, 2021.
↑ "Ali Ghodsi". Companies House . Retrieved March 8, 2024.
↑ "Business Insider: The coolest people under 40 in Silicon Valley". Business Insider .
↑ "Spark processing engine more at home in cloud, Databricks CEO says".
↑ Andrea, Guzman (June 13, 2023). "These 13 A.I. innovators are deciding how the tech will change your life".
↑ "Geek Of The Week: Ali Ghodsi, CEO Of Databricks". CloudWedge. 2019-10-16. Retrieved 2019-10-19.
↑ "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center" (PDF).
↑ "Spark SQL: Relational Data Processing in Spark" (PDF).
↑ "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types".
↑ "Hadoop MapReduce Next Generation - Fair Scheduler".
↑ "Former SICS-researcher Ali Ghodsi new CEO of Databricks". RISE web site. January 13, 2016. Retrieved May 6, 2017.
↑ "Databricks Announces Changes in Leadership Team to Align With Rapid Growth". Press release. January 11, 2016. Retrieved May 6, 2017.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Ali Ghodsi, Databricks CEO: A Fortt Knox Conversation. April 12, 2023 – via YouTube.

[2] "Talks at GS with Ali Ghodsi". Goldman Sachs. July 16, 2021.

[3] "Ali Ghodsi". Companies House . Retrieved March 8, 2024.

[4] "Business Insider: The coolest people under 40 in Silicon Valley". Business Insider .

[5] "Spark processing engine more at home in cloud, Databricks CEO says".

[:0-6] Andrea, Guzman (June 13, 2023). "These 13 A.I. innovators are deciding how the tech will change your life".

[7] "Geek Of The Week: Ali Ghodsi, CEO Of Databricks". CloudWedge. 2019-10-16. Retrieved 2019-10-19.

[8] "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center" (PDF).

[9] "Spark SQL: Relational Data Processing in Spark" (PDF).

[10] "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types".

[11] "Hadoop MapReduce Next Generation - Fair Scheduler".

[12] "Former SICS-researcher Ali Ghodsi new CEO of Databricks". RISE web site. January 13, 2016. Retrieved May 6, 2017.

[13] "Databricks Announces Changes in Leadership Team to Align With Rapid Growth". Press release. January 11, 2016. Retrieved May 6, 2017.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Ali Ghodsi

Born	December 1978 (age 45) Iran^[1]^[2]
Citizenship	Swedish
Alma mater	KTH Royal Institute of Technology (Ph.D.)
Known for	Apache Mesos, Apache Spark, Co-founding Databricks
Title	CEO of Databricks
Scientific career
Fields	Computer Science
Institutions	Databricks UC Berkeley
Thesis	Distributed k-ary System: Algorithms for Distributed Hash Tables (2006)
Doctoral advisor	Seif Haridi

Website	www.cs.berkeley.edu/~alig