DataStax

Last updated
DataStax
Type Private
Industry Database Technologies
Genre Multi-Model DBMS
FoundedApril 2010
Austin, TX, USA
Founder
  • Jonathan Ellis
  • Matt Pfeil
Headquarters,
United States
Key people
Chet Kapoor [1] (CEO)
Davor Bonaci (CTO)
Ed Anuff (CPO)
Don Dixon (CFO)
Brad Gyger (CRO)
Jason McClelland (CMO)
Chris Vogel (Chief People Officer)
Number of employees
800+ (June 2022) [2]
Website DataStax.com

DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. [3] Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries. [4] [5] [2]

Contents

History

DataStax was built on the open source NoSQL database Apache Cassandra. Cassandra was initially developed internally at Facebook to handle large data sets across multiple servers, [6] and was released as an Apache open source project in 2008. [7] In 2010, Jonathan Ellis and Matt Pfeil left Rackspace, where they had worked with Cassandra, to launch Riptano in Austin, Texas. [6] [8] Ellis and Pfeil later renamed the company DataStax, and moved its headquarters to Santa Clara, California. [3] [9]

The company went on to create its own enterprise version of Cassandra, a NoSQL database called DataStax Enterprise (DSE). [6]

In 2019, Chet Kapoor was named the company's new CEO, taking over from Billy Bosworth. [10]

Original logo Datastax logo.svg
Original logo

In May 2020, DataStax released Astra DB, a DBaaS for Cassandra applications. [11] In November 2020, DataStax released K8ssandra, an open source distribution of Cassandra on Kubernetes. [12] In December 2020, DataStax released Stargate, an open source data API gateway. [13]

After acquiring streaming event vendor Kesque in January 2021, [14] the company launched Luna Streaming, a data streaming platform for Apache Pulsar. [15] DataStax then rebuilt the Kesque technology into Astra Streaming. [16] The Astra Streaming cloud service became generally available on June 29, 2022. [17] With the release, the company added API-level support for messaging tools Apache Kafka, RabbitMQ and Java Message Service, in addition to Apache Pulsar. [18] [19] Astra Streaming can connect to a larger data platform by utilizing DataStax’s Astra DB cloud service. [18]

Starting in 2023, DataStax began incorporating artificial intelligence and machine learning into its platform. [20] In January 2023, the company acquired Kaskada, developer of a platform that helps organizations use data for AI applications. [21] DataStax made the formerly proprietary Kaskada technology open source, and integrated it into its Luna ML service, which was launched on May 4, 2023. [22] With the acquisition, former Kaskada CEO Davor Bonaci was named DataStax chief technology officer and executive vice president. [22]

On May 24, 2023, DataStax announced that it would be partnering with ThirdAI to bring large language models to DSE and AstraDB, to help developers develop generative AI applications. [23]

In June 2023, the company announced the development of a GPT-based schema translator in its Astra Streaming cloud service. The Astra Streaming GPT Schema Translator uses generative AI to automatically generate schema mappings, to enable data integration and interoperability between multiple systems and data sources. [24]

On July 18, 2023, the company announced a partnership with Google to make semantic search available in its Astra DB cloud database for developers building generative AI applications. [20]

On September 13, 2023, DataStax launched the LangStream open source project, which works with Astra DB and supports vector databases including Milvus and Pinecone. LangStream enables developers to better work with streaming data sources, using Apache Kafka technology and generative AI to help build event-driven architectures. [25]

In November 2023, DataStax announced RAGStack, a simplified commercial offering for RAG (retrieval-augmented generation) based on LangChain and Astra DB vector search. [26]

Products

Astra DB

Astra DB is available on cloud services such as Microsoft Azure, Amazon Web Services, and Google Cloud Platform. [27] In February 2021, DataStax announced the serverless version of Astra DB, offering developers pay-as-you-go data. [28]

In March 2022, DataStax introduced new change data capture (CDC) capabilities to its Astra DB cloud service. Astra DB CDC is powered by Apache Pulsar, which allows developers to manage operational and streaming data in one place. [29] DataStax leads the open-source Starlight, which provides a compatibility layer for different protocols on top of Apache Pulsar. [18]

On February 8, 2023, DataStax launched Astra Block, a cloud-based service based on the Ethereum blockchain to support building Web3 applications, available as part of Astra DB. Astra Block can be used by developers to stream enhanced data from the Ethereum blockchain to build or scale Web3 experiences on Astra DB. [30]

Astra DB supports open source LangChain technology, making it easier for developers to create generative AI applications. [20]

DSE

Version 1.0 of the DataStax Enterprise (DSE), released in October 2011, was the first commercial distribution of the Cassandra database, designed to provide real-time application performance and heavy analytics on the same physical infrastructure. [31] [32] It grew to include advanced security controls, graph database models, operational analytics and advanced search capabilities. [33]

In April 2016, the company announced the release of DataStax Enterprise Graph, adding graph data model functionality to DSE. [34]

In March 2017, DataStax announced the release of its DSE platform 5.1, which included improved search capabilities, improved security control, improvements to its Graph data management and improvements to operational analytics performance. DataStax also announced a shift in strategy, with an added focus on customer experience applications. Rather than a new set of technologies, the company started to offer advice on best practice to users of its core DSE platform. [35] [33]

In April 2018, DataStax released DSE 6, with the new version focused on businesses using a hybrid cloud computing model, with all the benefits of a distributed cloud database on any public cloud or on-premise, twice the responsiveness and ability to handle twice the throughput. [36] [37]

In December 2018, DataStax released DSE 6.7, which offers enterprise customers five key new feature upgrades, including: improved analytics, geospatial search, improved data protection in the cloud, enhanced performance insights and new developer integration tools with Apache Kafka Connector and certified production Docker images. [38]

In April 2020, DataStax released DSE 6.8, offering enterprises new capabilities for bare-metal performance and to support more workloads, and serving as a Kubernetes operator for Cassandra. [39]

DSE 7.0 was introduced in August 2023. It offers enhancements in cloud-native operations and generative AI capabilities, and includes vector search. [40]

Funding and IPO

In September 2014, DataStax raised $106 million in a Series E funding round, raising the total investment in the company to $190 million. [3] On June 15, 2022, the company announced it had raised an additional $115 million, at a $1.6 billion valuation. [2] [41]

In 2020, Mergermarket reported that DataStax was preparing for an initial public offering that could launch in 2021. [42] However, in June 2022, DataStax CEO Chet Kapoor said that the company would not rush into an IPO. [2]

See also

Related Research Articles

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

<span class="mw-page-title-main">Open-core model</span> Business model monetizing commercial open-source software

The open-core model is a business model for the monetization of commercially produced open-source software. The open-core model primarily involves offering a "core" or feature-limited version of a software product as free and open-source software, while offering "commercial" versions or add-ons as proprietary software. The term was coined by Andrew Lampitt in 2008.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is an American software company founded by the original creators of Apache Spark. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. The company develops Delta Lake, an open-source project to bring reliability to data lakes for machine learning and other data science use cases.

<span class="mw-page-title-main">Apache Flink</span> Framework and distributed processing engine

Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.

A wide-column store is a column-oriented DBMS and therefore a special type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

BlueTalon, Inc. was a private enterprise software company that provides data-centric security, user access control, data masking, and auditing solutions for complex, hybrid data environments. BlueTalon was founded in 2013 by Pratik Verma and is headquartered in Redwood City, California.

jKool

jKool is a software company based in Plainview, NY, that produces software for visualizing and analyzing machine-generated data, including: logs, metrics and transactions in real-time, via a web-based interface. jKool analyzes big data including both data-in-motion (real-time) and data-at-rest (historical). jKool offer its solutions through several channels including IBM Bluemix and as an on-premises offering.

<span class="mw-page-title-main">DBeaver</span> Multi-platform database administration software

DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. This is a desktop application written in Java and based on Eclipse platform.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

MindsDB is an open-source virtual database which automates pipelines that connect real time data to AI systems.

References

  1. "Announcing Our New CEO".
  2. 1 2 3 4 "Cassandra vendor DataStax secures $115m investment for $1.6b valuation". theregister.com. Retrieved August 8, 2022.
  3. 1 2 3 Gage, Deborah (4 September 2014). "DataStax Raises $106 Million in New Pre-IPO Round, Chips Away at Oracle". Wall Street Journal.
  4. Banks, Martin (6 October 2017). "DataStax adds Oracle to provide practical collaboration". Diginomica.com.
  5. Clancy, Heather (14 April 2015). "DataStax just scored a big partnership with HP. Here's why". Fortune.
  6. 1 2 3 "OUT IN THE OPEN: THE ABANDONED FACEBOOK TECH THAT NOW HELPS POWER APPLE". Wired. 4 August 2014. Retrieved 18 September 2017.
  7. Jackson, Joab (18 October 2011). "Apache Cassandra Ready for the Enterprise". CIO. Archived from the original on 6 September 2018. Retrieved 5 September 2018.
  8. Clark, Don (26 October 2010). "Start-Up Riptano Predicts Success With Cassandra Database". Wall Street Journal.
  9. Harris, Derrick (4 September 2014). "NoSQL is growing up, and DataStax just raised $106M to prove it". gigaom.com.
  10. "Former Google VP Chet Kapoor joins DataStax as CEO". siliconangle.com. 22 October 2019. Retrieved February 22, 2021.
  11. "Cassandra Now Officially In the Cloud with Datastax Astra". datanami.com. 12 May 2020. Retrieved February 26, 2021.
  12. "DataStax unveils K8ssandra as cloud-native Cassandra". zdnet.com. Retrieved February 26, 2021.
  13. "Meet Stargate, DataStax's GraphQL for databases". zdnet.com. Retrieved February 26, 2021.
  14. "DataStax enters event streaming market with Apache Pulsar". techtarget.com. Retrieved August 8, 2022.
  15. "DataStax acquires Kesque". techcrunch.com. Retrieved February 26, 2021.
  16. "DataStax cofounder on evolving Cassandra for modern workloads". venturebeat.com. Retrieved August 8, 2022.
  17. "DataStax Astra gets support for Kafka, RabbitMQ and JMS in bid to capture the 'full data story'". diginomica.com. 2022-06-29. Retrieved 2023-03-30.
  18. 1 2 3 "DataStax extends Astra Streaming event data platform". techtarget.com. Retrieved August 8, 2022.
  19. "DataStax Astra gets support for Kafka, RabbitMQ and JMS in bid to capture the 'full data story'". diginomica.com. Retrieved August 8, 2022.
  20. 1 2 3 "DataStax brings vector database search to multicloud with Astra DB". venturebeat.com. Retrieved December 1, 2023.
  21. "AI feature engineering is focus as DataStax acquires Kaskada". venturebeat.com. Retrieved December 1, 2023.
  22. 1 2 "DataStax extends AI feature engineering with Luna ML". venturebeat.com. Retrieved December 1, 2023.
  23. "DataStax taps ThirdAI to bring generative AI to its database offerings". infoworld.com. Retrieved December 1, 2023.
  24. "DataStax Plumbs AI Into Smarter Data Pipelines". forbes.com. Retrieved December 1, 2023.
  25. "DataStax takes aim at event-driven AI with open source LangStream project". venturebeat.com. Retrieved December 1, 2023.
  26. "With RAGStack, DataStax enables generative AI models to gain additional context from third-party data". siliconangle.com. Retrieved December 1, 2023.
  27. "DataStax offers serverless, NoSQL Astra DB across multiple regions, clouds". infoworld.com. Retrieved August 8, 2022.
  28. "DataStax Astra serverless DBaaS optimizes deployments". techtarget.com. Retrieved August 8, 2022.
  29. "DataStax CEO: Every use case doesn't need a new database". infoworld.com. Retrieved August 8, 2022.
  30. "DataStax launches Astra Block to support Web3 applications". infoworld.com. Retrieved December 1, 2023.
  31. Cohan, Peter (24 Nov 2017). "DataStax Partners With Oracle In $46B Database Market". Forbes.com. Archived from the original on 5 September 2018. Retrieved 5 September 2018.
  32. Harris, Derrick (20 September 2011). "DataStax gets $11M, fuses NoSQL and Hadoop". gigaom.com.
  33. 1 2 Carey, Scott (4 October 2017). "How DataStax wants its NoSQL platform to drive the 'right now economy'". Computerworld UK. Archived from the original on 5 September 2018. Retrieved 5 September 2018.
  34. Miller, Ron (12 April 2016). "DataStax adds graph databases to enterprise Cassandra product set". techcrunch.com.
  35. "DataStax CEO launches new CX strategy – focus shifting from tech to business". diginomica. 15 March 2017. Retrieved 12 September 2017.
  36. Sargent, Jenna (19 April 2018). "DataStax Enterprise 6 released with double the Apache Cassandra performance". San Diego Times.
  37. Whiting, Rick (17 April 2018). "DataStax Pushes The Cloud Database Performance Boundary With New Release". crn.com.
  38. "DataStax announces the release of DSE 6.7". datastax.com.
  39. "DataStax". crn.com. 28 April 2020. Retrieved February 22, 2021.
  40. "DataStax Announces Vector Search for DataStax Enterprise". datanami.com. Retrieved December 1, 2023.
  41. "DataStax raises $115M to advance its data stack". techtarget.com. Retrieved August 8, 2022.
  42. "Venture Capital-Backed Tech Firm Exits To Watch In 2021". forbes.com. Retrieved February 22, 2021.