Databricks

Last updated

Databricks, Inc.
Company type Private
Industry Computer software
Founded2013;11 years ago (2013)
Founders
Headquarters,
United States
Key people
  • Ali Ghodsi
  • (CEO)
  • Ion Stoica
  • (Executive chairman)
RevenueIncrease2.svg $1.6 billion (2023) [1]
Number of employees
c.5,500 (2023) [2]
Website databricks.com

Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark. [3]

Contents

The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models. [4]

Databricks pioneered the data lakehouse, a data and AI platform that combines the capabilities of a data warehouse with a data lake, allowing organizations to manage and use both structured and unstructured data for traditional business analytics and AI workloads. [5]

In November 2023, Databricks unveiled the Databricks Data Intelligence Platform, a new offering that combines the unification benefits of the lakehouse with MosaicML’s Generative AI technology to enable customers to better understand and use their own proprietary data. [6]

The company develops Delta Lake, an open-source project to bring reliability to data lakes for machine learning and other data science use cases. [7]

History

Databricks booth DatabricksBooth.jpg
Databricks booth

Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, an open-source distributed computing framework built atop Scala. The company was founded by Ali Ghodsi, Andy Konwinski, Arsalan Tavakoli-Shiraji, Ion Stoica, Matei Zaharia, [8] Patrick Wendell, and Reynold Xin.

In November 2017, the company was announced as a first-party service on Microsoft Azure via integration Azure Databricks. [9] In February 2021 together with Google Cloud, Databricks provided integration with the Google Kubernetes Engine and Google's BigQuery platform. [10] By this time, the company said more than 5,000 organizations used its products. [11]

Fortune ranked Databricks as one of the best large "Workplaces for Millennials" in 2021. [12]

Acquisitions

Much of the company's expansion has come through acquisition. In June 2020, they acquired Redash, an open-source tool designed to help data scientists and analysts visualize and build interactive dashboards of their data. [13] Their second acquisition was of German no-code company 8080 Labs, the makers of bamboolib, a data exploration tool requires no coding to use. [14] The third acquisition was in May 2023, of the data security group Okera, extending their data governance capabilities. [15] The next month, they bought the open-source generative AI startup MosaicML for $1.4 billion. [16] [17] In October of that year Databricks acquired data replication startup Arcion for $100 million. [18] In what is believed to be the sixth acquisition, the company bought Tabular, a data-management system used by open source AI, for over $1 billion. [19]

In response to the popularity of OpenAI's ChatGPT, in March 2023, the company introduced an open-source language model, named Dolly after Dolly the sheep, that developers could use to create their own chatbots. Their model uses fewer parameters to produce similar results as ChatGPT, but Databricks had not released formal benchmark tests to show whether its bot actually matched the performance of ChatGPT. [20] [21] [22]

Databricks reported $1.6 billion in revenue for the 2023 fiscal year, more than doubling its previous level. [23]

Funding

In September 2013, Databricks announced it raised $13.9 million from Andreessen Horowitz and said it aimed to offer an alternative to Google's MapReduce system. [24] [25] Microsoft was a noted investor of Databricks in 2019, participating in the company's Series E at an unspecified amount. [26] [27] The company has raised $1.9 billion in funding, including a $1 billion Series G led by Franklin Templeton at a $28 billion post-money valuation in February 2021. Other investors include Amazon Web Services, CapitalG (a growth equity firm under Alphabet Inc.) and Salesforce Ventures. [11] In August 2021, Databricks finished its eighth round of funding by raising $1.6 billion and valuing the company at $38 billion. [28]

Funding rounds
Series DateAmount (million $)Lead investors
A 201313.9 [24] Andreessen Horowitz
B201433 [29] New Enterprise Associates
C201660 [30] New Enterprise Associates
D2017140 [31] Andreessen Horowitz
EFeb. 2019250 [32] Andreessen Horowitz
FOct. 2019400 [33] Andreessen Horowitz
GJan. 20211,000 [34] Franklin Templeton Investments
HAug. 20211,600 [35] Morgan Stanley
ISep. 2023500 [36] Capital One Ventures, Nvidia

Products

Databricks develops and sells a cloud data platform using the marketing term "lakehouse", a portmanteau based on the terms "data warehouse" and "data lake". [37] Databricks' lakehouse is based on the open source Apache Spark framework that allows analytical queries against semi-structured data without a traditional database schema. [38] In October 2022, Lakehouse received FedRAMP authorized status for use with the U.S. federal government and contractors. [39]

Databricks' Delta Engine launched in June 2020 as a new query engine that layers on top of Delta Lake to boost query performance. [40] It is compatible with Apache Spark and MLflow, which are also open source projects Databricks employees helped create. [41]

In November 2020, Databricks introduced Databricks SQL (previously known as SQL Analytics) for running business intelligence and analytics reporting on top of data lakes. Analysts can query data sets directly with standard SQL or use product connectors to integrate directly with business intelligence tools like Holistics, Tableau, Qlik, SigmaComputing, Looker, and ThoughtSpot. [42]

Databricks offers a platform for other workloads, including machine learning, data storage and processing, streaming analytics, and business intelligence. [43]

The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning. [44] [45] In addition to building the Databricks platform, the company has co-organized massive open online courses about Spark [46] and a conference for the Spark community called the Data + AI Summit, [47] formerly known as Spark Summit.

In early 2024, Databricks released a portfolio of new tools to help customers customize, fine-tune or build their own AI systems, including: Mosaic AI Vector Search, which enables companies to build RAG models, Mosaic AI Model Serving, a unified service for deploying, governing, querying and monitoring models fine-tuned or pre-deployed by Databricks, and Mosaic AI Pretraining, a platform for enterprises to create their own LLMs. [48]

In March 2024, Databricks released DBRX, an open source foundation model. It relies on a mixture-of-experts architecture and is built on the MegaBlocks open source project. [49]

DBRX cost $10 million to create. At the time of launch, it was the fastest open source LLM, based on commonly-used industry benchmarks. It beat other models like LlaMA2 at solving logic puzzles and answering general knowledge questions, among other tasks. And while it’s a 136 billion parameters model, it uses only an average of 36 billion to generate outputs. [50]

DBRX also serves as a foundation for companies to build or customize their own AI models. Companies can also use proprietary data to generate higher-quality outputs for specific use cases. [51]

Operations

Databricks is headquartered in San Francisco. [52] It also has operations in Canada, the United Kingdom, and elsewhere. [53]

Related Research Articles

Automattic Inc. is an American global distributed company which was founded in August 2005 and is most notable for WordPress.com, as well as its contributions to WordPress. The company's name is a play on founder Matt Mullenweg's first name and the word "automatic".

<span class="mw-page-title-main">Qlik</span> Software company whose main products are QlikView and Qlik Sense

Qlik [pronounced "klik"] provides a data integration, analytics, and artificial intelligence platform. The software company was founded in 1993 in Lund, Sweden and is now based in King of Prussia, Pennsylvania, United States. Thoma Bravo made the company private in 2016.

<span class="mw-page-title-main">Canva</span> Online graphic design platform

Canva is a graphic design platform that provides tools for creating social media graphics, presentations, promotional merchandise and websites. Launched in Australia in 2013, the service offers design tools for individuals and companies. Its offerings include templates for presentations, posters, and social media content, as well as functionalities for photo and video editing.

Andreessen Horowitz is a private American venture capital firm, founded in 2009 by Marc Andreessen and Ben Horowitz. The company is headquartered in Menlo Park, California. As of April 2023, Andreessen Horowitz ranks first on the list of venture capital firms by assets under management, with $42 billion as of May 2024.

<span class="mw-page-title-main">Stripe, Inc.</span> Irish-American multinational financial services and SaaS company

Stripe, Inc. is an Irish-American multinational financial services and software as a service (SaaS) company dual-headquartered in South San Francisco, California, United States and Dublin, Ireland. The company primarily offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.

DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.

Sisense is an American business intelligence software company headquartered in New York City, United States. It also has offices in San Francisco, Scottsdale and in other locations.

GitLab Inc. is an open-core company that operates GitLab, a DevOps software package that can develop, secure, and operate software. GitLab includes a distributed version control based on Git, including features such as access control, bug tracking, software feature requests, task management, and wikis for every project, as well as snippets.

Instabase is a technology company headquartered in San Francisco. The company provides an applied AI platform that can be used to automate business processes.

OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.

ThoughtSpot, Inc. is a technology company that produces business intelligence analytics search software. The company is based in Mountain View, California, and was founded in 2012.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.

<span class="mw-page-title-main">VAST Data</span> Artificial intelligence company

VAST Data is a privately held technology company focused on artificial intelligence (AI) and deep learning computing infrastructure. Founded in 2016, the company offers a data computing platform that allows users to train AI models by storing and synthesizing large amounts of unstructured data.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL·E, DALL·E 2, and DALL·E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as "prompts".

<span class="mw-page-title-main">Tiger Global Management</span> American investment firm

Tiger Global Management, LLC is an American investment firm founded by Chase Coleman III, a former Tiger Management employee under Julian Robertson, in March 2001. It mainly focuses on internet, software, consumer, and financial technology companies.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Mistral AI</span> French artificial intelligence company

Mistral AI is a French company specializing in artificial intelligence (AI) products. Founded in April 2023 by former employees of Meta Platforms and Google DeepMind, the company has quickly risen to prominence in the AI sector.

<span class="mw-page-title-main">IBM Granite</span> 2023 text-generating language model

IBM Granite is a series of decoder-only AI foundation models created by IBM. It was announced on September 7, 2023, and an initial paper was published 4 days later. Initially intended for use in the IBM's cloud-based data and generative AI platform Watsonx along with other models, IBM opened the source code of some code models. Granite models are trained on datasets curated from Internet, academic publishings, code datasets, legal and finance documents.

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.

References

  1. Lin, Belle (March 6, 2024). "AI is Driving Record Sales at Multibillion-Dollar Databricks. An IPO Can Wait …" . The Wall Street Journal . Archived from the original on March 6, 2024.
  2. Corrie, Driebusch (July 29, 2023). "The Tech CEO Who Uses His Phone the Old-Fashioned Way". The Wall Street Journal . Archived from the original on February 28, 2024.
  3. Saul, Derek (September 14, 2023). "Top IPO Prospect Databricks Scores $43 Billion Valuation Thanks To $500 Million Funding Round Including AI Titan Nvidia". Forbes. Archived from the original on September 4, 2024. Retrieved March 26, 2024.
  4. Sullivan, Mark (March 19, 2024). "How Databricks is helping customers develop their own customized AI models". Fast Company. Retrieved March 19, 2024.
  5. Clark, Lindsay (November 16, 2023). "Databricks' lakehouse becomes foundation under fresh layer of AI dreams". The Register. Archived from the original on September 4, 2024. Retrieved November 16, 2023.
  6. Cai, Kenrick (November 16, 2023). "Databricks' New AI Product Adds A ChatGPT-Like Interface To Its Software". Forbes. Archived from the original on September 4, 2024. Retrieved November 16, 2023.
  7. "Databricks launches Delta Lake, an open source data lake reliability project". VentureBeat. April 24, 2019. Archived from the original on March 24, 2022. Retrieved April 6, 2021.
  8. Zaharia, Matei. "Matei Zaharia". Archived from the original on March 10, 2014. Retrieved August 16, 2016.
  9. "Microsoft makes Databricks a first-party service on Azure". TechCrunch. November 15, 2017. Archived from the original on September 4, 2024. Retrieved April 6, 2021.
  10. "Databricks brings its lakehouse to Google Cloud". TechCrunch. February 17, 2021. Archived from the original on September 4, 2024. Retrieved February 18, 2021.
  11. 1 2 Konrad, Alex (February 2, 2021). "Databricks Raises $1 Billion At $28 Billion Valuation, With The Cloud's Elite All Buying In". Forbes. Archived from the original on February 1, 2021. Retrieved July 29, 2021.
  12. "100 Best Large Workplaces for Millennials". Fortune. June 16, 2021. Archived from the original on March 24, 2022. Retrieved July 16, 2021.
  13. "Databricks acquires Redash, a visualizations service for data scientists". TechCrunch. June 24, 2020. Retrieved April 6, 2021.
  14. Eric Rosenbaum (October 6, 2021). "$38 billion software start-up Databricks makes acquisition to leave code behind". CNBC. Archived from the original on October 6, 2021. Retrieved February 20, 2022.
  15. Palazzolo, Stephanie (May 3, 2023). "Exclusive: $38 billion data and AI darling Databricks acquires security startup Okera" . Business Insider . Archived from the original on May 3, 2023.
  16. Datta, Tiyashi; Hu, Krystal (June 26, 2023). "Databricks strikes $1.3 billion deal for generative AI startup MosaicML". Reuters. Archived from the original on June 26, 2023. Retrieved June 27, 2023.
  17. Council, Stephen (June 26, 2023). "SF tech firm Databricks to buy 2-year-old startup for $21 million per employee". SFGATE . Archived from the original on June 26, 2023. Retrieved June 27, 2023.
  18. "After $43B valuation, Databricks acquires data replication startup Arcion for $100M". TechCrunch. October 23, 2023. Retrieved October 23, 2023.
  19. Galloni, Allessandra, ed. (June 5, 2024). "Databricks to buy data management firm Tabular for over $1 bln". Reuters .
  20. Hu, Krystal; Nellis, Stephen (March 24, 2023). "Databricks pushes open-source chatbot as cheaper ChatGPT alternative". Reuters. Archived from the original on March 25, 2023.
  21. Loften, Angus (March 24, 2023). "Databricks Launches 'Dolly,' Another ChatGPT Rival" . The Wall Street Journal . Archived from the original on March 24, 2023.
  22. Goldman, Sharon (March 24, 2023). "Databricks debuts ChatGPT-like Dolly, a clone any enterprise can own". VentureBeat . Archived from the original on April 11, 2023.
  23. Wilhelm, Ron Miller and Alex (March 7, 2024). "Databricks keeps marching forward with $1.6B in revenue". TechCrunch. Archived from the original on March 12, 2024. Retrieved March 8, 2024.
  24. 1 2 Harris, Derrick (September 25, 2013). "Databricks raises $14M from Andreessen Horowitz, wants to take on MapReduce with Spark". Archived from the original on January 15, 2022. Retrieved September 28, 2014.
  25. Lorica, Ben (September 25, 2013). "Databricks aims to build next-generation analytic tools for Big Data". O'Reilly Media. Archived from the original on July 4, 2014. Retrieved September 28, 2014.
  26. "Databricks raises $250M at a $2.75B valuation for its analytics platform". TechCrunch. February 5, 2019. Archived from the original on September 4, 2024. Retrieved April 8, 2021.
  27. Novet, Jordan (February 5, 2019). "Microsoft used to scare start-ups but is now an 'outstandingly good partner,' says Silicon Valley investor Ben Horowitz". CNBC. Archived from the original on February 5, 2019. Retrieved April 6, 2021.
  28. Mellor, Chris (September 1, 2021). "Databricks raises data lake of cash at monstrous $380bn valuation". Blocks & Files. Archived from the original on September 1, 2021. Retrieved September 4, 2021.
  29. Miller, Ron (June 30, 2014). "Databricks Snags $33M In Series B And Debuts Cloud Platform For Processing Big Data". TechCrunch. Archived from the original on July 1, 2014. Retrieved September 28, 2014.
  30. Shieber, Jonathan (December 15, 2016). "Databricks raises $60 million to be big data's next great leap forward". TechCrunch. Archived from the original on December 15, 2016. Retrieved December 16, 2016.
  31. "Databricks Secures $140 Million to Accelerate Analytics and Artificial Intelligence in the Enterprise". Databricks. August 22, 2017. Archived from the original on January 13, 2022. Retrieved May 16, 2019.
  32. "Databricks' $250 Million Funding Supports Explosive Growth and Global Demand for Unified Analytics; Brings Valuation to $2.75 Billion". Databricks. February 5, 2019. Archived from the original on January 15, 2022. Retrieved February 5, 2019.
  33. "Databricks announces $400M round on $6.2B valuation as analytics platform continues to grow". TechCrunch. October 22, 2019. Archived from the original on September 4, 2024. Retrieved October 24, 2019.
  34. "Databricks raises $1B at $28B valuation as it reaches $425M ARR". Tech Crunch. February 2021. Archived from the original on November 3, 2021. Retrieved February 14, 2021.
  35. "Databricks raises $1.6B at $38B valuation as it blasts past $600M ARR". Tech Crunch. Archived from the original on December 30, 2021. Retrieved July 1, 2021.
  36. Nishant, Niket; Hu, Krystal (September 14, 2023). "Databricks raises over $500 mln at $43 bln valuation". Reuters. Retrieved September 20, 2023.
  37. Michael, Armbrust; Ghodsi, Ali; Xin, Reynold; Zaharia, Matei (January 2021). "Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics" (PDF). Conference on Innovative Data Systems Research . Archived (PDF) from the original on December 22, 2020. Retrieved July 29, 2021.
  38. "With massive $1B infusion, Databricks takes aim at IPO and rival Snowflake". SiliconANGLE. February 1, 2021. Archived from the original on April 6, 2023. Retrieved April 8, 2021.
  39. Simone, Stephanie (October 17, 2022). "Databricks achieves FedRAMP Authorized status". KMWorld. Information Today. Archived from the original on October 20, 2022. Retrieved October 20, 2022.
  40. "Databricks Cranks Delta Lake Performance, Nabs Redash for SQL Viz". Datanami. June 24, 2020. Archived from the original on July 9, 2020. Retrieved April 8, 2021.
  41. "Databricks launches Delta Lake, an open source data lake reliability project". VentureBeat. April 24, 2019. Archived from the original on March 24, 2022. Retrieved April 8, 2021.
  42. "Databricks launches SQL Analytics". TechCrunch. November 12, 2020. Archived from the original on September 4, 2024. Retrieved April 8, 2021.
  43. Brust, Andrew. "Databricks, champion of data "lakehouse" model, closes $1B series G funding round". ZDNet. Archived from the original on February 1, 2021. Retrieved April 8, 2021.
  44. "The Two Sigma Ventures Open Source Index". Two Sigma Ventures. Archived from the original on November 29, 2022. Retrieved April 8, 2021.
  45. "MLOps Tools - Ranking. OSS Insight". OSS Insight. Archived from the original on September 4, 2024. Retrieved April 3, 2024.
  46. "Databricks to run two massive online courses on Apache Spark". Databricks. December 2, 2014. Archived from the original on January 13, 2022. Retrieved December 16, 2016.
  47. "Data + AI Summit". Databricks. Archived from the original on April 23, 2022. Retrieved April 8, 2021.
  48. "Riding the data-powered AI wave: Inside Databricks' unified stack solution". Databricks. March 14, 2024. Archived from the original on September 4, 2024. Retrieved April 5, 2024.
  49. "Databricks open-sources its own large language model, DBRX". Databricks. March 27, 2024. Archived from the original on April 5, 2024. Retrieved April 5, 2024.
  50. "Inside the Creation of the World's Most Powerful Open Source AI Model". Databricks. March 27, 2024. Archived from the original on September 4, 2024. Retrieved April 5, 2024.
  51. "Databricks' new open-source AI model could offer enterprises a leaner alternative to OpenAI's GPT-3.5". Databricks. March 27, 2024. Archived from the original on September 4, 2024. Retrieved April 5, 2024.
  52. staff, CNBC com (June 16, 2020). "36. Databricks". CNBC. Archived from the original on December 24, 2022. Retrieved April 8, 2021.
  53. "Worldwide locations". Archived from the original on June 7, 2023. Retrieved October 20, 2022.