Company type | Private |
---|---|
Industry | Computer software |
Founded | 2013[1] |
Founders |
|
Headquarters | , United States |
Key people |
|
Revenue | $1.6 billion (2023) [2] |
Number of employees | c. 8,000 (2025) [3] |
Website | databricks |
Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company, founded in 2013 by the original creators of Apache Spark. [1] [4] The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models. [5]
Databricks pioneered the data lakehouse, a data and AI platform that combines the capabilities of a data warehouse with a data lake, allowing organizations to manage and use both structured and unstructured data for traditional business analytics and AI workloads. [6] The company similarly develops Delta Lake, an open-source project to bring reliability to data lakes for machine learning and other data science use cases. [7]
Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, an open-source distributed computing framework built atop Scala. The company was founded by Ali Ghodsi, Andy Konwinski, Arsalan Tavakoli-Shiraji, Ion Stoica, Matei Zaharia, [8] Patrick Wendell, and Reynold Xin.[ citation needed ]
In November 2017, the company was announced as a first-party service on Microsoft Azure via integration Azure Databricks. [9]
In February 2021, together with Google Cloud, Databricks provided integration with the Google Kubernetes Engine and Google's BigQuery platform. [10] At this point in time, the company said more than 5,000 organizations used its products. [11]
Fortune ranked Databricks as one of the "Best Large Workplaces for Millennials" in 2021. [12]
In November 2023, Databricks unveiled the Databricks Data Intelligence Platform, a new offering that combines the unification benefits of the lakehouse with MosaicML’s Generative AI technology to enable customers to better understand and use their own proprietary data. [13]
The firm was valued at $62 billion in December 2024. [14]
In June 2020, Databricks bought Redash, an open-source tool for data visualization and building of interactive dashboards. [15] In 2021, it bought German no-code company 8080 Labs whose product, bamboolib, allowed data exploration without any coding. [16] In May 2023, Databricks bought data security group Okera, extending Databricks data governance capabilities. [17] In June, it bought the open-source generative AI startup MosaicML for $1.4 billion. [18] [19] In October, Databricks bought data replication startup Arcion for $100 million. [20] In what is believed to be its sixth acquisition, Databricks bought Tabular, a data-management system used by open source AI, for over $1 billion. [21]
In March 2023, in response to the popularity of OpenAI's ChatGPT, the company introduced an open-source language model, named Dolly after Dolly the sheep, that allowed developers to create chatbots. Dolly uses fewer parameters to produce similar results as ChatGPT, but Databricks had not released formal benchmark tests to show whether its bot actually matched the performance of ChatGPT. [22] [23] [24]
Databricks reported $1.6 billion in revenue for the 2023 fiscal year, more than doubling its previous level. [25]
In September 2013, Databricks announced it raised $13.9 million from Andreessen Horowitz and said it aimed to offer an alternative to Google's MapReduce system. [26] [27] Microsoft was a noted investor of Databricks in 2019, participating in the company's Series E at an unspecified amount. [28] [29] The company has raised $1.9 billion in funding, including a $1 billion Series G led by Franklin Templeton at a $28 billion post-money valuation in February 2021. Other investors include Amazon Web Services, CapitalG (a growth equity firm under Alphabet Inc.) and Salesforce Ventures. [11] In August 2021, Databricks finished its eighth round of funding by raising $1.6 billion and valuing the company at $38 billion. [30] In December 2024, Databricks announced one of the largest funding rounds in history, a $10 billion financing at a valuation of $62 billion. [14]
Series | Date | Amount (million $) | Lead investors |
---|---|---|---|
A | 2013 | 13.9 [26] | Andreessen Horowitz |
B | 2014 | 33 [31] | New Enterprise Associates |
C | 2016 | 60 [32] | New Enterprise Associates |
D | 2017 | 140 [33] | Andreessen Horowitz |
E | Feb. 2019 | 250 [34] | Andreessen Horowitz |
F | Oct. 2019 | 400 [35] | Andreessen Horowitz |
G | Jan. 2021 | 1,000 [36] | Franklin Templeton Investments |
H | Aug. 2021 | 1,600 [37] | Morgan Stanley |
I | Sep. 2023 | 500 [38] | Capital One Ventures, Nvidia |
J | Dec. 2024 | 10,000 [39] | Thrive Capital |
Databricks develops and sells a cloud data platform using the marketing term "lakehouse", a portmanteau of "data warehouse" and "data lake". [40] Databricks' Lakehouse is based on the open-source Apache Spark framework that allows analytical queries against semi-structured data without a traditional database schema. [41] In October 2022, Lakehouse received FedRAMP authorized status for use with the U.S. federal government and contractors. [42]
The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning. [43] [44]
In June 2020, Databricks launched Delta Engine, a fast query engine for Delta Lake, [45] compatible with Apache Spark and MLflow. [46]
In November 2020, Databricks introduced Databricks SQL (previously called SQL Analytics) for running business intelligence and analytics reporting on top of data lakes. Analysts can query data sets with standard SQL or use connectors to integrate with business intelligence tools like Holistics, [47] Tableau, Qlik, SigmaComputing, [48] Looker, and ThoughtSpot. [49]
Databricks offers a platform for other workloads, including machine learning, data storage and processing, streaming analytics, and business intelligence. [50]
In early 2024, Databricks released the Mosaic set of tools for customizing, fine-tuning and building AI systems. It includes AI Vector Search for building RAG models; AI Model Serving, a service for deploying, governing, querying and monitoring models fine-tuned or pre-deployed by Databricks; and AI Pretraining, a platform for enterprises to create their own LLMs. [51]
In March 2024, Databricks released DBRX, an open-source foundation model. It has a mixture-of-experts architecture and is built on the MegaBlocks open-source project. [52] DBRX cost $10 million to create. At the time of launch, it was the fastest open-source LLM, based on commonly-used industry benchmarks. It beat other models like Llama 2 at solving logic puzzles and answering general knowledge questions, among other tasks. And while it has 136 billion parameters, it only uses 36 billion, on average, to generate outputs. [53] DBRX also serves as a foundation for companies to build or customize their own AI models. Companies can also use proprietary data to generate higher-quality outputs for specific use cases. [54]
In addition to building the Databricks platform, the company has co-organized massive open online courses about Spark [55] and a conference for the Spark community called the Data + AI Summit, [56] formerly known as Spark Summit.[ citation needed ]
In December 2024, Databricks along with Wiz and Workday has decided to run their products on top of AWS via the new button called "Buy with AWS button". [57]
Databricks is headquartered in San Francisco. [58] It also has operations in Canada, the United Kingdom, and elsewhere. [59]
Salesforce, Inc. is an American cloud-based software company headquartered in San Francisco, California. It provides applications focused on sales, customer service, marketing automation, e-commerce, analytics, artificial intelligence, and application development.
Canva is an Australian multinational software company that provides a graphic design platform that provides tools for creating social media graphics, presentations, postcards, promotional merchandise and websites. Launched in Australia in 2013, the service offers design tools for individuals and companies. Its offerings include templates for presentations, posters, and social media content, as well as functionalities for photo and video editing.
Cloudera, Inc. is an American data lake software company.
AH Capital Management, LLC is an American privately held venture capital firm, founded in 2009 by Marc Andreessen and Ben Horowitz. The company is headquartered in Menlo Park, California. As of April 2023, Andreessen Horowitz ranks first on the list of venture capital firms by assets under management, with $42 billion as of May 2024.
Stripe, Inc. is an Irish-American multinational financial services and software as a service (SaaS) company dual-headquartered in South San Francisco, California, United States, and Dublin, Ireland. The company primarily offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.
Grammarly is an English language writing assistant software tool. It reviews the spelling, grammar, and tone of a piece of writing as well as identifying possible instances of plagiarism. It can also suggest style and tonal recommendations to users and produce writing from prompts with its generative AI capabilities.
Sisense is an American business intelligence software company headquartered in New York City, United States. It also has offices in San Francisco, Scottsdale and in other locations.
Alteryx, Inc. is an American computer software company based in Irvine, California, with offices worldwide. The company's products are used for data science and analytics.
GitLab Inc. is a company that operates and develops GitLab, an open-core DevOps software package that can develop, secure, and operate software. GitLab includes a distributed version control system based on Git, including features such as access control, bug tracking, software feature requests, task management, and wikis for every project, as well as snippets.
Instabase is a technology company headquartered in San Francisco. The company provides an applied AI platform that can be used to automate business processes.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.
ThoughtSpot, Inc. is a technology company that produces business intelligence analytics search software. The company is based in Mountain View, California, and was founded in 2012.
Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.
Tiger Global Management, LLC is an American investment firm founded by Chase Coleman III, a former Tiger Management employee under Julian Robertson, in March 2001. It mainly focuses on internet, software, consumer, and financial technology companies.
Data build tool (dbt) is an open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively.
Snyk Limited is a developer-oriented cybersecurity company, specializing in securing custom developed code, open-source dependencies and cloud infrastructure. It was founded in 2015 out of London and Tel Aviv and is headquartered in Boston.
Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
Thrive Capital Management, LLC, commonly Thrive Capital, is an American venture capital firm based in New York City. It focuses on software and internet investments. The firm was founded by Joshua Kushner who is also co-founder of Oscar Health and minority owner of the Memphis Grizzlies.
Perplexity AI is a conversational search engine that uses large language models (LLMs) to answer queries using sources from the web and cites links within the text response. Its developer, Perplexity AI, Inc., is based in San Francisco, California.