Company type | Private |
---|---|
Industry | Computer software |
Founded | 2013 |
Founders |
|
Headquarters | , United States |
Key people |
|
Revenue | $1.6 billion (2023) [1] |
Number of employees | c. 5,500 (2023) [2] |
Website | databricks |
Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark. [3]
The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models. [4]
Databricks pioneered the data lakehouse, a data and AI platform that combines the capabilities of a data warehouse with a data lake, allowing organizations to manage and use both structured and unstructured data for traditional business analytics and AI workloads. [5]
Databricks acquired MosaicML for $1.4 billion in June 2023, its largest acquisition. [6]
In November 2023, Databricks unveiled the Databricks Data Intelligence Platform, a new offering that combines the unification benefits of the lakehouse with MosaicML’s Generative AI technology to enable customers to better understand and use their own proprietary data. [7]
The company develops Delta Lake, an open-source project to bring reliability to data lakes for machine learning and other data science use cases. [8]
Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, an open-source distributed computing framework built atop Scala. The company was founded by Ali Ghodsi, Andy Konwinski, Arsalan Tavakoli-Shiraji, Ion Stoica, Matei Zaharia, [9] Patrick Wendell, and Reynold Xin.
In November 2017, the company was announced as a first-party service on Microsoft Azure via integration Azure Databricks. [10]
In June 2020, Databricks acquired Redash, an open-source tool designed to help data scientists and analysts visualize and build interactive dashboards of their data. [11]
In February 2021 together with Google Cloud, Databricks provided integration with the Google Kubernetes Engine and Google's BigQuery platform. [12] Fortune ranked Databricks as one of the best large "Workplaces for Millennials" in 2021. [13] At the time, the company said more than 5,000 organizations used its products. [14]
In August 2021, Databricks finished its eighth round of funding by raising $1.6 billion and valuing the company at $38 billion. [15]
In October 2021, Databricks made its second acquisition of German no-code company 8080 Labs. 8080 Labs makes bamboolib, a data exploration tool that does not require coding to use. [16]
In response to the popularity of OpenAI's ChatGPT, in March 2023, the company introduced an open-source language model, named Dolly after Dolly the sheep, that developers could use to create their own chatbots. Their model uses fewer parameters to produce similar results as ChatGPT, but Databricks had not released formal benchmark tests to show whether its bot actually matched the performance of ChatGPT. [17] [18] [19]
Databricks acquired data security startup Okera in May 2023 to extend its data governance capabilities. [20] The next month, it acquired an open-source generative AI startup MosaicML for $1.4 billion. [21] [22]
In October 2023, Databricks acquired data replication startup Arcion for $100 million. [23]
Databricks reported $1.6 billion in revenue for the 2023 fiscal year, more than doubling its previous level. [24]
In September 2013, Databricks announced it raised $13.9 million from Andreessen Horowitz and said it aimed to offer an alternative to Google's MapReduce system. [25] [26] Microsoft was a noted investor of Databricks in 2019, participating in the company's Series E at an unspecified amount. [27] [28] The company has raised $1.9 billion in funding, including a $1 billion Series G led by Franklin Templeton at a $28 billion post-money valuation in February 2021. Other investors include Amazon Web Services, CapitalG (a growth equity firm under Alphabet Inc.) and Salesforce Ventures. [14]
Series | Date | Amount (million $) | Lead investors |
---|---|---|---|
A | 2013 | 13.9 [25] | Andreessen Horowitz |
B | 2014 | 33 [29] | New Enterprise Associates |
C | 2016 | 60 [30] | New Enterprise Associates |
D | 2017 | 140 [31] | Andreessen Horowitz |
E | Feb. 2019 | 250 [32] | Andreessen Horowitz |
F | Oct. 2019 | 400 [33] | Andreessen Horowitz |
G | Jan. 2021 | 1,000 [34] | Franklin Templeton Investments |
H | Aug. 2021 | 1,600 [35] | Morgan Stanley |
I | Sep. 2023 | 500 [36] | Capital One Ventures, Nvidia |
Databricks develops and sells a cloud data platform using the marketing term "lakehouse", a portmanteau based on the terms "data warehouse" and "data lake". [37] Databricks' lakehouse is based on the open source Apache Spark framework that allows analytical queries against semi-structured data without a traditional database schema. [38] In October 2022, Lakehouse received FedRAMP authorized status for use with the U.S. federal government and contractors. [39]
Databricks' Delta Engine launched in June 2020 as a new query engine that layers on top of Delta Lake to boost query performance. [40] It is compatible with Apache Spark and MLflow, which are also open source projects Databricks employees helped create. [41]
In November 2020, Databricks introduced Databricks SQL (previously known as SQL Analytics) for running business intelligence and analytics reporting on top of data lakes. Analysts can query data sets directly with standard SQL or use product connectors to integrate directly with business intelligence tools like Holistics, Tableau, Qlik, SigmaComputing, Looker, and ThoughtSpot. [42]
Databricks offers a platform for other workloads, including machine learning, data storage and processing, streaming analytics, and business intelligence. [43]
The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning. [44] [45] In addition to building the Databricks platform, the company has co-organized massive open online courses about Spark [46] and a conference for the Spark community called the Data + AI Summit, [47] formerly known as Spark Summit.
In early 2024, Databricks released a portfolio of new tools to help customers customize, fine-tune or build their own AI systems, including: Mosaic AI Vector Search, which enables companies to build RAG models, Mosaic AI Model Serving, a unified service for deploying, governing, querying and monitoring models fine-tuned or pre-deployed by Databricks, and Mosaic AI Pretraining, a platform for enterprises to create their own LLMs. [48]
In March 2024, Databricks released DBRX, an open source foundation model. It relies on a mixture-of-experts architecture and is built on the MegaBlocks open source project. [49]
DBRX cost $10 million to create. At the time of launch, it was the fastest open source LLM, based on commonly-used industry benchmarks. It beat other models like LlaMA2 at solving logic puzzles and answering general knowledge questions, among other tasks. And while it’s a 136 billion parameters model, it uses only an average of 36 billion to generate outputs. [50]
DBRX also serves as a foundation for companies to build or customize their own AI models. Companies can also use proprietary data to generate higher-quality outputs for specific use cases. [51]
Databricks is headquartered in San Francisco. [52] It also has operations in Canada, the United Kingdom, Netherlands, Singapore, Australia, Germany, France, Japan, China, South Korea, India, Brazil, Switzerland, Costa Rica and Serbia. [53]
Salesforce, Inc. is an American cloud-based software company headquartered in San Francisco, California. It provides customer relationship management (CRM) software and applications focused on sales, customer service, marketing automation, e-commerce, analytics, and application development.
Canva is an online template editor app for creating social media graphics, presentations, merch and websites.
Andreessen Horowitz is a private American venture capital firm, founded in 2009 by Marc Andreessen and Ben Horowitz. The company is headquartered in Menlo Park, California. As of April 2023, Andreessen Horowitz ranks first on the list of venture capital firms by assets under management, with $42 billion as of May 2024.
ServiceNow, Inc. is an American software company based in Santa Clara, California, that develops a cloud computing platform to help companies manage digital workflows for enterprise operations. Founded in 2003 by Fred Luddy, ServiceNow is listed on the New York Stock Exchange and is a constituent of the Russell 1000 Index and S&P 500 Index. In 2018, Forbes magazine named it number one on its list of the world's most innovative companies.
Stripe, Inc. is an Irish-American multinational financial services and software as a service (SaaS) company dual-headquartered in South San Francisco, California, United States and Dublin, Ireland. The company primarily offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.
GitLab Inc. is an open-core company that operates GitLab, a DevOps software package that can develop, secure, and operate software. The open-source software project was created by Ukrainian developer Dmytro Zaporozhets and Dutch developer Sytse Sijbrandij. In 2018, GitLab Inc. was considered to be the first partly-Ukrainian unicorn.
Instabase is a technology company headquartered in San Francisco. The company provides an applied AI platform that can be used to automate business processes.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI boom, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI boom.
Dataiku is an American artificial intelligence (AI) and machine learning company which was founded in 2013. In December 2019, Dataiku announced that CapitalG—the late-stage growth venture capital fund financed by Alphabet Inc.—joined Dataiku as an investor and that it had achieved unicorn status. As of 2021, Dataiku is valued at $4.6 billion. Dataiku currently employs more than 1,000 people worldwide between offices in New York, Denver, Washington DC, Los Angeles, Paris, London, Munich, Frankfurt, Sydney, Singapore, Tokyo, and Dubai.
ThoughtSpot, Inc. is a technology company that produces business intelligence analytics search software. The company is based in Mountain View, California, and was founded in 2012.
Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.
VAST Data is a privately held technology company focused on artificial intelligence (AI) and deep learning computing infrastructure. Founded in 2016, the company offers a data computing platform that allows users to train AI models by storing and synthesizing large amounts of unstructured data.
Alation is a venture-backed, B2B enterprise software company based in Silicon Valley. Its solutions are focused on data catalog, analytics, and data management.
Hugging Face, Inc. is a French-American company based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
AI21 Labs is an Israeli company specializing in Natural Language Processing (NLP), which develops AI systems that can understand and generate natural language.
Mistral AI is a French company selling artificial intelligence (AI) products. It was founded in April 2023 by previous employees of Meta Platforms and Google DeepMind. The company raised €385 million in October 2023, and in December 2023, it was valued at more than $2 billion.
Perplexity AI is an AI-chatbot-powered research and conversational search engine that answers queries using natural language predictive text. Launched in 2022, Perplexity generates answers using the sources from the web and cites links within the text response. Perplexity works on a freemium model; the free product uses its Perplexity model based on OpenAI's GPT-3.5 model combined with the company's standalone large language model (LLM) that incorporates natural language processing (NLP) capabilities, while the paid version Perplexity Pro has access to GPT-4, Claude 3, Mistral Large, Llama 3 and an Experimental Perplexity Model. It has garnered about 10 million monthly users.
IBM Granite is a series of decoder-only foundation models created by IBM. It was announced on September 7, 2023, and an initial paper was published 4 days later. Initially intended for use in the IBM's cloud-based data and generative AI platform Watsonx along with other models, IBM opened the source code of some code models. Granite models are trained on datasets curated from Internet, academic publishings, code datasets, legal and finance documents.
DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.