AMPLab

Last updated

AMPLAB was a University of California, Berkeley lab focused on big data analytics located in Soda Hall. The name stands for the Algorithms, Machines and People Lab. [1] [2] It has been publishing papers since 2008 [3] and was officially launched in 2011. [4] The AMPLab was co-directed by Professor Michael J. Franklin, Michael I. Jordan, and Ion Stoica.

While AMPLab has worked on a wide variety of big data projects (known as BDAS, the Berkeley Data Analytics Stack [5] ), many know it as the lab that invented Apache Mesos, [6] and Apache Spark, [7] and Alluxio. [8]

Berkeley launched RISELab [9] as the successor to AMPLab in 2017. [10]

Related Research Articles

David Bader (computer scientist) American computer scientist

David A. Bader is a Distinguished Professor and Director of the Institute for Data Science at the New Jersey Institute of Technology. Previously, he served as the Chair of the Georgia Institute of Technology School of Computational Science & Engineering, where he was also a founding professor, and the executive director of High-Performance Computing at the Georgia Tech College of Computing. In 2007, he was named the first director of the Sony Toshiba IBM Center of Competence for the Cell Processor at Georgia Tech. Bader has served on the Computing Research Association's Board of Directors, the National Science Foundation's Advisory Committee on Cyberinfrastructure, and on the IEEE Computer Society's Board of Governors. He is an expert in the design and analysis of parallel and multicore algorithms for real-world applications such as those in cybersecurity and computational biology. His main areas of research are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics. Bader built the first Linux supercomputer using commodity processors and a high-speed interconnection network.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Linux Foundation Non-profit technology consortium to develop the Linux operating system

The Linux Foundation (LF) is a non-profit technology consortium founded in 2000 as a merger between Open Source Development Labs and the Free Standards Group to standardize Linux, support its growth, and promote its commercial adoption. Additionally, it hosts and promotes the collaborative development of open source software projects. It is a major force in promoting diversity and inclusion in both Linux and the wider open source software community.

Michael J. Franklin is an American software entrepreneur and computer scientist specializing in distributed and streaming database technology. He is Liew Family Chair of Computer Science and Chairman for the Department of Computer Science at the University of Chicago.

Vertica Software company

Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker, with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as later CEOs.

Ion Stoica Romanian-American computer scientist

Ion Stoica is a Romanian-American computer scientist specializing in distributed systems, cloud computing and computer networking. He is a professor of computer science at the University of California, Berkeley and co-director of AMPLab. He co-founded Conviva and Databricks with other original developers of Apache Spark.

Apache Spark Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, Google Drive, and YouTube. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics and machine learning. Registration requires a credit card or bank account details.

Databricks Modern Cloud Data Platform

Databricks is an American enterprise software company founded by the creators of Apache Spark. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.

Apache Mesos Software to manage computer clusters

Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.

Ali Ghodsi

Ali Ghodsi is a Swedish-Iranian computer scientist and entrepreneur specializing in distributed systems and big data. He is a co-founder and CEO of Databricks and an adjunct professor at UC Berkeley. Ideas from his academic research in the area of resource management and scheduling and data caching have been applied in popular open source projects such as Apache Mesos, Apache Spark, and Apache Hadoop.

Alibaba Cloud, also known as Aliyun, is a cloud computing company, a subsidiary of Alibaba Group. Alibaba Cloud provides cloud computing services to online businesses and Alibaba's own e-commerce ecosystem. Its international operations are registered and headquartered in Singapore.

Reynold Xin is a computer scientist and engineer specializing in big data, distributed systems, and cloud computing. He is a co-founder and Chief Architect of Databricks. He is best known for his work on Apache Spark, which as of June 2016 is the top open-source Big Data project. He designed and lead development of the GraphX, Project Tungsten, and Structured Streaming components and he co-designed DataFrames—all of which are part of the core Apache Spark distribution—plus served as the release manager for Spark's 2.0 release.

D2iQ is an American technology company based in San Francisco, California which develops software that simplifies Kubernetes lifecycle management, deployment to hybrid, multi-cloud, and edge environments and enables advanced application use cases. Its flagship product is called the D2iQ Kubernetes Platform (DKP).

Apache Arrow is a language-agnostic software framework for developing data analytics applications that process columnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modern CPU and GPU hardware. This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints of dynamic random-access memory.

Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & Professor Ion Stoica. Alluxio sits between computation and storage in the big data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License.

Haoyuan (H.Y.) Li is a computer scientist and entrepreneur specializing in distributed systems, big data, and cloud computing. He is best known for proposing Virtual Distributed File System (VDFS), and creating an open-source data orchestration system, Alluxio. He is the Founder, Chairman, and CEO of Alluxio, Inc, a company commercializing the Alluxio Data Orchestration Technology. He is also an adjunct professor at Peking University. He is a frequent speaker on the topic of AI, Big Data, Cloud Computing, and Open Source at conferences.

References

  1. "AMPLab Releases Succinct, A New Way to Query Data in Spark". Datanami. 2015-11-11. Retrieved 2016-06-06.
  2. Harris, Derrick. "Gigaom | The lab that created Spark wants to speed up everything, including cures for cancer" . Retrieved 2016-06-06.
  3. "Publications | AMPLab – UC Berkeley". AMPLab - UC Berkeley. Retrieved 2018-01-29.
  4. "About". AMPLab - UC Berkeley. Retrieved 2018-01-29.
  5. "BDAS, the Berkeley Data Analytics Stack".
  6. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center" (PDF).
  7. "Spark: Cluster computing with working sets" (PDF).
  8. "Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks" (PDF).
  9. "RISELab".
  10. "Berkeley launches RISELab, enabling computers to make intelligent real-time decisions".