DataStax

Last updated

DataStax
Company type Private
Industry Database Technologies
Genre Multi-Model DBMS
FoundedApril 2010
Founder
  • Jonathan Ellis
  • Matt Pfeil
Headquarters,
United States
Key people
  • Chet Kapoor [1] (CEO)
  • Davor Bonaci (CTO)
  • Ed Anuff (CPO)
  • Don Dixon (CFO)
  • Brad Gyger (CRO)
  • Jason McClelland (CMO)
  • Chris Vogel (CPO)
Number of employees
800+ (June 2022) [2]
Website www.datastax.com OOjs UI icon edit-ltr-progressive.svg

DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. [3] Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries. [4] [5] [2]

Contents

History

DataStax was built on the open source NoSQL database Apache Cassandra. Cassandra was initially developed internally at Facebook to handle large data sets across multiple servers, [6] and was released as an Apache open source project in 2008. [7] In 2010, Jonathan Ellis and Matt Pfeil left Rackspace, where they had worked with Cassandra, to launch Riptano in Austin, Texas. [6] [8] Ellis and Pfeil later renamed the company DataStax, and moved its headquarters to Santa Clara, California. [3] [9]

The company went on to create its own enterprise version of Cassandra, a NoSQL database called DataStax Enterprise (DSE). [6]

In 2019, Chet Kapoor was named the company's new CEO, taking over from Billy Bosworth. [10]

Original logo Datastax logo.svg
Original logo

In May 2020, DataStax released Astra DB, a DBaaS for Cassandra applications. [11] In November 2020, DataStax released K8ssandra, an open source distribution of Cassandra on Kubernetes. [12] In December 2020, DataStax released Stargate, an open source data API gateway. [13]

After acquiring streaming event vendor Kesque in January 2021, [14] the company launched Luna Streaming, a data streaming platform for Apache Pulsar. [15] DataStax then rebuilt the Kesque technology into Astra Streaming. [16] The Astra Streaming cloud service became generally available on June 29, 2022. [17] With the release, the company added API-level support for messaging tools Apache Kafka, RabbitMQ and Java Message Service, in addition to Apache Pulsar. [18] [19] Astra Streaming can connect to a larger data platform by utilizing DataStax’s Astra DB cloud service. [18]

Starting in 2023, DataStax began incorporating artificial intelligence and machine learning into its platform. [20] In January 2023, the company acquired Kaskada, developer of a platform that helps organizations use data for AI applications. [21] DataStax made the formerly proprietary Kaskada technology open source, and integrated it into its Luna ML service, which was launched on May 4, 2023. [22] With the acquisition, former Kaskada CEO Davor Bonaci was named DataStax chief technology officer and executive vice president. [22]

On May 24, 2023, DataStax announced that it would be partnering with ThirdAI to bring large language models to DSE and AstraDB, to help developers develop generative AI applications. [23]

In June 2023, the company announced the development of a GPT-based schema translator in its Astra Streaming cloud service. The Astra Streaming GPT Schema Translator uses generative AI to automatically generate schema mappings, to enable data integration and interoperability between multiple systems and data sources. [24]

On July 18, 2023, the company announced a partnership with Google to make semantic search available in its Astra DB cloud database for developers building generative AI applications. [20]

On September 13, 2023, DataStax launched the LangStream open source project, which works with Astra DB and supports vector databases including Milvus and Pinecone. LangStream enables developers to better work with streaming data sources, using Apache Kafka technology and generative AI to help build event-driven architectures. [25]

In November 2023, DataStax announced RAGStack, a simplified commercial offering for RAG (retrieval-augmented generation) based on LangChain and Astra DB vector search. [26]

Products

Astra DB

Astra DB is available on cloud services such as Microsoft Azure, Amazon Web Services, and Google Cloud Platform. [27] In February 2021, DataStax announced the serverless version of Astra DB, offering developers pay-as-you-go data. [28]

In March 2022, DataStax introduced new change data capture (CDC) capabilities to its Astra DB cloud service. Astra DB CDC is powered by Apache Pulsar, which allows developers to manage operational and streaming data in one place. [29] DataStax leads the open-source Starlight, which provides a compatibility layer for different protocols on top of Apache Pulsar. [18]

On February 8, 2023, DataStax launched Astra Block, a cloud-based service based on the Ethereum blockchain to support building Web3 applications, available as part of Astra DB. Astra Block can be used by developers to stream enhanced data from the Ethereum blockchain to build or scale Web3 experiences on Astra DB. [30]

Astra DB supports open source LangChain technology, making it easier for developers to create generative AI applications. [20]

DSE

Version 1.0 of the DataStax Enterprise (DSE), released in October 2011, was the first commercial distribution of the Cassandra database, designed to provide real-time application performance and heavy analytics on the same physical infrastructure. [31] [32] It grew to include advanced security controls, graph database models, operational analytics and advanced search capabilities. [33]

In April 2016, the company announced the release of DataStax Enterprise Graph, adding graph data model functionality to DSE. [34]

In March 2017, DataStax announced the release of its DSE platform 5.1, which included improved search capabilities, improved security control, improvements to its Graph data management and improvements to operational analytics performance. DataStax also announced a shift in strategy, with an added focus on customer experience applications. Rather than a new set of technologies, the company started to offer advice on best practice to users of its core DSE platform. [35] [33]

In April 2018, DataStax released DSE 6, with the new version focused on businesses using a hybrid cloud computing model, with all the benefits of a distributed cloud database on any public cloud or on-premise, twice the responsiveness and ability to handle twice the throughput. [36] [37]

In December 2018, DataStax released DSE 6.7, which offers enterprise customers five key new feature upgrades, including: improved analytics, geospatial search, improved data protection in the cloud, enhanced performance insights and new developer integration tools with Apache Kafka Connector and certified production Docker images. [38]

In April 2020, DataStax released DSE 6.8, offering enterprises new capabilities for bare-metal performance and to support more workloads, and serving as a Kubernetes operator for Cassandra. [39]

DSE 7.0 was introduced in August 2023. It offers enhancements in cloud-native operations and generative AI capabilities, and includes vector search. [40]

Funding and IPO

In September 2014, DataStax raised $106 million in a Series E funding round, raising the total investment in the company to $190 million. [3] On June 15, 2022, the company announced it had raised an additional $115 million, at a $1.6 billion valuation. [2] [41]

In 2020, Mergermarket reported that DataStax was preparing for an initial public offering that could launch in 2021. [42] However, in June 2022, DataStax CEO Chet Kapoor said that the company would not rush into an IPO. [2]

See also

Related Research Articles

Oracle Database is a proprietary multi-model database management system produced and marketed by Oracle Corporation.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source database management system designed to handle large volumes of data across multiple commodity servers. The system prioritizes availability and scalability over consistency, making it particularly suited for systems with high write throughput requirements due to its LSM tree indexing storage layer. As a wide-column database, Cassandra supports flexible schemas and efficiently handles data models with numerous sparse columns. The system is optimized for applications with well-defined data access patterns that can be incorporated into the schema design. Cassandra supports computer clusters which may span multiple data centers, featuring asynchronous and masterless replication. It enables low-latency operations for all clients and incorporates Amazon's Dynamo distributed storage and replication techniques, combined with Google's Bigtable data storage engine model.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

<span class="mw-page-title-main">Open-core model</span> Business model monetizing commercial open-source software

The open-core model is a business model for the monetization of commercially produced open-source software. The open-core model primarily involves offering a "core" or feature-limited version of a software product as free and open-source software, while offering "commercial" versions or add-ons as proprietary software. The term was coined by Andrew Lampitt in 2008.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

SingleStore is a distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma et al. Registration requires a credit card or bank account details.

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company, founded in 2013 by the original creators of Apache Spark. The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models.

A wide-column store is a column-oriented DBMS and therefore a special type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

BlueTalon, Inc. was a private enterprise software company, that provided data-centric security, user access control, data masking, and auditing solutions for complex, hybrid data environments. BlueTalon was founded in 2013 by Pratik Verma and is headquartered in Redwood City, California.

jKool

jKool is a software company based in Plainview, NY, that produces software for visualizing and analyzing machine-generated data, including: logs, metrics and transactions in real-time, via a web-based interface. jKool analyzes big data including both data-in-motion (real-time) and data-at-rest (historical). jKool offer its software through several channels including IBM Bluemix and as an on-premises offering.

<span class="mw-page-title-main">DBeaver</span> Multi-platform database administration software

DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. It is written in Java and based on the Eclipse platform.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

MindsDB is an artificial intelligence company headquartered in California, an innovator bringing AI and Data together and is focused on enabling developers to build AI capabilities that can Reason, Plan and Orchestrate over enterprise data.

References

  1. "Announcing Our New CEO".
  2. 1 2 3 4 "Cassandra vendor DataStax secures $115m investment for $1.6b valuation". theregister.com. Retrieved August 8, 2022.
  3. 1 2 3 Gage, Deborah (September 4, 2014). "DataStax Raises $106 Million in New Pre-IPO Round, Chips Away at Oracle". Wall Street Journal.
  4. Banks, Martin (October 6, 2017). "DataStax adds Oracle to provide practical collaboration". Diginomica.com.
  5. Clancy, Heather (April 14, 2015). "DataStax just scored a big partnership with HP. Here's why". Fortune.
  6. 1 2 3 "OUT IN THE OPEN: THE ABANDONED FACEBOOK TECH THAT NOW HELPS POWER APPLE". Wired. August 4, 2014. Retrieved September 18, 2017.
  7. Jackson, Joab (October 18, 2011). "Apache Cassandra Ready for the Enterprise". CIO. Archived from the original on September 6, 2018. Retrieved September 5, 2018.
  8. Clark, Don (October 26, 2010). "Start-Up Riptano Predicts Success With Cassandra Database". Wall Street Journal.
  9. Harris, Derrick (September 4, 2014). "NoSQL is growing up, and DataStax just raised $106M to prove it". gigaom.com.
  10. "Former Google VP Chet Kapoor joins DataStax as CEO". siliconangle.com. October 22, 2019. Retrieved February 22, 2021.
  11. "Cassandra Now Officially In the Cloud with Datastax Astra". datanami.com. May 12, 2020. Retrieved February 26, 2021.
  12. "DataStax unveils K8ssandra as cloud-native Cassandra". ZDNet . Retrieved February 26, 2021.
  13. "Meet Stargate, DataStax's GraphQL for databases". ZDNet . Retrieved February 26, 2021.
  14. "DataStax enters event streaming market with Apache Pulsar". techtarget.com. Retrieved August 8, 2022.
  15. "DataStax acquires Kesque". techcrunch.com. Retrieved February 26, 2021.
  16. "DataStax cofounder on evolving Cassandra for modern workloads". venturebeat.com. Retrieved August 8, 2022.
  17. "DataStax Astra gets support for Kafka, RabbitMQ and JMS in bid to capture the 'full data story'". diginomica.com. June 29, 2022. Retrieved March 30, 2023.
  18. 1 2 3 "DataStax extends Astra Streaming event data platform". techtarget.com. Retrieved August 8, 2022.
  19. "DataStax Astra gets support for Kafka, RabbitMQ and JMS in bid to capture the 'full data story'". diginomica.com. Retrieved August 8, 2022.
  20. 1 2 3 "DataStax brings vector database search to multicloud with Astra DB". venturebeat.com. Retrieved December 1, 2023.
  21. "AI feature engineering is focus as DataStax acquires Kaskada". venturebeat.com. Retrieved December 1, 2023.
  22. 1 2 "DataStax extends AI feature engineering with Luna ML". venturebeat.com. Retrieved December 1, 2023.
  23. "DataStax taps ThirdAI to bring generative AI to its database offerings". infoworld.com. Retrieved December 1, 2023.
  24. "DataStax Plumbs AI Into Smarter Data Pipelines". forbes.com. Retrieved December 1, 2023.
  25. "DataStax takes aim at event-driven AI with open source LangStream project". venturebeat.com. Retrieved December 1, 2023.
  26. "With RAGStack, DataStax enables generative AI models to gain additional context from third-party data". siliconangle.com. Retrieved December 1, 2023.
  27. "DataStax offers serverless, NoSQL Astra DB across multiple regions, clouds". infoworld.com. Retrieved August 8, 2022.
  28. "DataStax Astra serverless DBaaS optimizes deployments". techtarget.com. Retrieved August 8, 2022.
  29. "DataStax CEO: Every use case doesn't need a new database". infoworld.com. Retrieved August 8, 2022.
  30. "DataStax launches Astra Block to support Web3 applications". infoworld.com. Retrieved December 1, 2023.
  31. Cohan, Peter (November 24, 2017). "DataStax Partners With Oracle In $46B Database Market". Forbes.com. Archived from the original on September 5, 2018. Retrieved September 5, 2018.
  32. Harris, Derrick (September 20, 2011). "DataStax gets $11M, fuses NoSQL and Hadoop". gigaom.com.
  33. 1 2 Carey, Scott (October 4, 2017). "How DataStax wants its NoSQL platform to drive the 'right now economy'". Computerworld UK. Archived from the original on September 5, 2018. Retrieved September 5, 2018.
  34. Miller, Ron (April 12, 2016). "DataStax adds graph databases to enterprise Cassandra product set". techcrunch.com.
  35. "DataStax CEO launches new CX strategy – focus shifting from tech to business". diginomica. March 15, 2017. Retrieved September 12, 2017.
  36. Sargent, Jenna (April 19, 2018). "DataStax Enterprise 6 released with double the Apache Cassandra performance". San Diego Times.
  37. Whiting, Rick (April 17, 2018). "DataStax Pushes The Cloud Database Performance Boundary With New Release". crn.com.
  38. "DataStax announces the release of DSE 6.7". datastax.com.
  39. "DataStax". crn.com. April 28, 2020. Retrieved February 22, 2021.
  40. "DataStax Announces Vector Search for DataStax Enterprise". datanami.com. Retrieved December 1, 2023.
  41. "DataStax raises $115M to advance its data stack". techtarget.com. Retrieved August 8, 2022.
  42. "Venture Capital-Backed Tech Firm Exits To Watch In 2021". forbes.com. Retrieved February 22, 2021.