Company type | Private |
---|---|
Industry | Database Technologies |
Genre | Multi-Model DBMS |
Founded | April 2010 Austin, TX, USA |
Founder |
|
Headquarters | , United States |
Key people | Chet Kapoor [1] (CEO) Davor Bonaci (CTO) Ed Anuff (CPO) Don Dixon (CFO) Brad Gyger (CRO) Jason McClelland (CMO) Chris Vogel (Chief People Officer) |
Number of employees | 800+ (June 2022) [2] |
Website | DataStax.com |
DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. [3] Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries. [4] [5] [2]
DataStax was built on the open source NoSQL database Apache Cassandra. Cassandra was initially developed internally at Facebook to handle large data sets across multiple servers, [6] and was released as an Apache open source project in 2008. [7] In 2010, Jonathan Ellis and Matt Pfeil left Rackspace, where they had worked with Cassandra, to launch Riptano in Austin, Texas. [6] [8] Ellis and Pfeil later renamed the company DataStax, and moved its headquarters to Santa Clara, California. [3] [9]
The company went on to create its own enterprise version of Cassandra, a NoSQL database called DataStax Enterprise (DSE). [6]
In 2019, Chet Kapoor was named the company's new CEO, taking over from Billy Bosworth. [10]
In May 2020, DataStax released Astra DB, a DBaaS for Cassandra applications. [11] In November 2020, DataStax released K8ssandra, an open source distribution of Cassandra on Kubernetes. [12] In December 2020, DataStax released Stargate, an open source data API gateway. [13]
After acquiring streaming event vendor Kesque in January 2021, [14] the company launched Luna Streaming, a data streaming platform for Apache Pulsar. [15] DataStax then rebuilt the Kesque technology into Astra Streaming. [16] The Astra Streaming cloud service became generally available on June 29, 2022. [17] With the release, the company added API-level support for messaging tools Apache Kafka, RabbitMQ and Java Message Service, in addition to Apache Pulsar. [18] [19] Astra Streaming can connect to a larger data platform by utilizing DataStax’s Astra DB cloud service. [18]
Starting in 2023, DataStax began incorporating artificial intelligence and machine learning into its platform. [20] In January 2023, the company acquired Kaskada, developer of a platform that helps organizations use data for AI applications. [21] DataStax made the formerly proprietary Kaskada technology open source, and integrated it into its Luna ML service, which was launched on May 4, 2023. [22] With the acquisition, former Kaskada CEO Davor Bonaci was named DataStax chief technology officer and executive vice president. [22]
On May 24, 2023, DataStax announced that it would be partnering with ThirdAI to bring large language models to DSE and AstraDB, to help developers develop generative AI applications. [23]
In June 2023, the company announced the development of a GPT-based schema translator in its Astra Streaming cloud service. The Astra Streaming GPT Schema Translator uses generative AI to automatically generate schema mappings, to enable data integration and interoperability between multiple systems and data sources. [24]
On July 18, 2023, the company announced a partnership with Google to make semantic search available in its Astra DB cloud database for developers building generative AI applications. [20]
On September 13, 2023, DataStax launched the LangStream open source project, which works with Astra DB and supports vector databases including Milvus and Pinecone. LangStream enables developers to better work with streaming data sources, using Apache Kafka technology and generative AI to help build event-driven architectures. [25]
In November 2023, DataStax announced RAGStack, a simplified commercial offering for RAG (retrieval-augmented generation) based on LangChain and Astra DB vector search. [26]
Astra DB is available on cloud services such as Microsoft Azure, Amazon Web Services, and Google Cloud Platform. [27] In February 2021, DataStax announced the serverless version of Astra DB, offering developers pay-as-you-go data. [28]
In March 2022, DataStax introduced new change data capture (CDC) capabilities to its Astra DB cloud service. Astra DB CDC is powered by Apache Pulsar, which allows developers to manage operational and streaming data in one place. [29] DataStax leads the open-source Starlight, which provides a compatibility layer for different protocols on top of Apache Pulsar. [18]
On February 8, 2023, DataStax launched Astra Block, a cloud-based service based on the Ethereum blockchain to support building Web3 applications, available as part of Astra DB. Astra Block can be used by developers to stream enhanced data from the Ethereum blockchain to build or scale Web3 experiences on Astra DB. [30]
Astra DB supports open source LangChain technology, making it easier for developers to create generative AI applications. [20]
Version 1.0 of the DataStax Enterprise (DSE), released in October 2011, was the first commercial distribution of the Cassandra database, designed to provide real-time application performance and heavy analytics on the same physical infrastructure. [31] [32] It grew to include advanced security controls, graph database models, operational analytics and advanced search capabilities. [33]
In April 2016, the company announced the release of DataStax Enterprise Graph, adding graph data model functionality to DSE. [34]
In March 2017, DataStax announced the release of its DSE platform 5.1, which included improved search capabilities, improved security control, improvements to its Graph data management and improvements to operational analytics performance. DataStax also announced a shift in strategy, with an added focus on customer experience applications. Rather than a new set of technologies, the company started to offer advice on best practice to users of its core DSE platform. [35] [33]
In April 2018, DataStax released DSE 6, with the new version focused on businesses using a hybrid cloud computing model, with all the benefits of a distributed cloud database on any public cloud or on-premise, twice the responsiveness and ability to handle twice the throughput. [36] [37]
In December 2018, DataStax released DSE 6.7, which offers enterprise customers five key new feature upgrades, including: improved analytics, geospatial search, improved data protection in the cloud, enhanced performance insights and new developer integration tools with Apache Kafka Connector and certified production Docker images. [38]
In April 2020, DataStax released DSE 6.8, offering enterprises new capabilities for bare-metal performance and to support more workloads, and serving as a Kubernetes operator for Cassandra. [39]
DSE 7.0 was introduced in August 2023. It offers enhancements in cloud-native operations and generative AI capabilities, and includes vector search. [40]
In September 2014, DataStax raised $106 million in a Series E funding round, raising the total investment in the company to $190 million. [3] On June 15, 2022, the company announced it had raised an additional $115 million, at a $1.6 billion valuation. [2] [41]
In 2020, Mergermarket reported that DataStax was preparing for an initial public offering that could launch in 2021. [42] However, in June 2022, DataStax CEO Chet Kapoor said that the company would not rush into an IPO. [2]
Oracle Database is a proprietary multi-model database management system produced and marketed by Oracle Corporation.
In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.
A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.
Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.
Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.
Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.
The open-core model is a business model for the monetization of commercially produced open-source software. The open-core model primarily involves offering a "core" or feature-limited version of a software product as free and open-source software, while offering "commercial" versions or add-ons as proprietary software. The term was coined by Andrew Lampitt in 2008.
A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.
Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.
Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.
A wide-column store is a column-oriented DBMS and therefore a special type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store.
Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.
BlueTalon, Inc. was a private enterprise software company that provides data-centric security, user access control, data masking, and auditing solutions for complex, hybrid data environments. BlueTalon was founded in 2013 by Pratik Verma and is headquartered in Redwood City, California.
jKool is a software company based in Plainview, NY, that produces software for visualizing and analyzing machine-generated data, including: logs, metrics and transactions in real-time, via a web-based interface. jKool analyzes big data including both data-in-motion (real-time) and data-at-rest (historical). jKool offer its software through several channels including IBM Bluemix and as an on-premises offering.
DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. This is a desktop application written in Java and based on Eclipse platform.
YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.
MindsDB is an open-source virtual database which automates pipelines that connect real-time data to AI systems.