Clustrix

Last updated
Clustrix Inc
Company typePrivate
Industry Computer database
FoundedDecember 15, 2006 (2006-12-15) in San Francisco, California, U.S.
FounderPaul Mikesell, Sergei Tsarev, Eric Hoffman
Headquarters,
United States
ProductsClustrix Database Server
Number of employees
40–50
Parent MariaDB Corporation AB
Website clustrix.com

Clustrix, Inc. is a San Francisco-based private company founded in 2006 that developed a database management system marketed as NewSQL. [1] [2]

Contents

History

Clustrix was founded in November 2006, and is sometimes called Sprout-Clustrix as it formed with the help of Y Combinator. [3] Founders include Paul Mikesell (formerly of EMC Isilon) and Sergei Tsarev. Some of its technology tested at customers since 2008. [4]

Initially called Sierra during the development phase, at its official announcement in 2010, the product was launched with the product name Clustered Database System (CDS). [5] [6] The company received $10 million in funding from Sequoia Capital, U.S. Venture Partners (USVP), and ATA Ventures in December 2010. [7] Robin Purohit became chief executive in October 2011, and another round of $6.75 million was raised in July 2012. [8] [9] Another round of funding from the original backers of $16.5 million was announced in May 2013, [10] and a round of $10 million in new funding in August 2013 was led by HighBAR Ventures. [7] Purohit was replaced by Mike Azevedo in 2014. [11] A round of over $23 million in debt financing was disclosed in February 2016. [12] On September 20, 2018 it was announced that Clustrix was acquired by MariaDB Corporation. [13]

Technology

Clustrix supports workloads that involve scaling transactions and real-time analytics. The system is a drop-in replacement for MySQL, and is designed to overcome MySQL scalability issues with a minimum of disruption. [14] It also has built in fault-tolerance features for high availability within a cluster. It has parallel backup and parallel replication among clusters for disaster recovery. Clustrix is a scale-out SQL database management system and part of what are often called the NewSQL database systems (modern relational database management systems), closely following the NoSQL movement. [15]

The product was marketed as a hardware "appliance" using InfiniBand through about 2014. [16] [6] [17] Clustrix's database was made available as downloadable software and from the Amazon Web Services Marketplace by 2013. [18] [19]

The primary competitors like Microsoft SQL Server and MySQL supported online transaction processing and online analytical processing but were not distributed. Clustrix provides a distributed relational, ACID database that scales transactions [20] and support real-time analytics. Other distributed relational databases are columnar (they don't support primary transaction workload) and focus on offline analytics and this includes EMC Greenplum, HP Vertica, Infobright, and Amazon Redshift. Notable players in the primary SQL database space are in-memory. This includes VoltDB and MemSQL, which excel at low-latency transactions, but do not target real-time analytics.[ citation needed ] NoSQL competitors, like MongoDB are good at handling unstructured data and read heavy workloads, but do not compete in the space for write heavy workloads (no transactions, coarse grained (DB-level) locking, and no SQL features (like joins), so the NewSQL and NoSQL databases are complementary.[ citation needed ]

Query evaluation

The Clustrix database operates on a distributed cluster of shared-nothing nodes using a query to data approach. [21] Here nodes typically own a subset of the data. SQL queries are split into query fragments and sent to the nodes that own the data. This enables Clustrix to scale horizontally (scale out) as additional nodes are added. [18]

Data distribution

The Clustrix database automatically splits and distributes data evenly across nodes with each slice having copies on other nodes. [22] Uniform data distribution is maintained as nodes are added, removed or if data is inserted unevenly. This automatic data distribution approach removes the need to shard and enables Clustrix to maintain database availability in the face of node loss. [23]

Performance

In a performance test completed by Percona in 2011, a three-node cluster saw about a 73% increase in speed over a similarly equipped single MySQL server running tests with 1024 simultaneous threads. [24] [25] Additional nodes added to the Clustrix cluster provided roughly linear increases in speed. [26]

Project cancellation

MariaDB announced in October of 2023 that Xpand (formerly known as Clustrix) had been discontinued. [27]

Related Research Articles

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

<span class="mw-page-title-main">Greenplum</span>

Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired by EMC Corporation in July 2010.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

<span class="mw-page-title-main">Neo4j</span> Graph database implemented in Java

Neo4j is a graph database management system (GDBMS) developed by Neo4j, Inc.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

<span class="mw-page-title-main">InfiniDB</span> Database management software company based in Frisco, Texas

InfiniDB was a database management software company based in Frisco, Texas. The company developed InfiniDB, a scalable, software-only columnar database management system for analytic applications.

<span class="mw-page-title-main">SingleStore</span> Database management system

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

<span class="mw-page-title-main">NuoDB</span>

NuoDB is a cloud-native distributed SQL database company based in Cambridge, Massachusetts. Founded in 2008 and incorporated in 2010, NuoDB technology has been used by Dassault Systèmes, as well as FinTech and financial industry entities including UAE Exchange, Temenos, and Santander Bank.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

ScaleBase was a company that sold software to databases for cloud computing. The software company was located in the Boston, Massachusetts, area.

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

<span class="mw-page-title-main">TransLattice</span>

TransLattice was a software company based in Santa Clara, California that operated from 2007 to around 2016. It geographically distributed databases and applications for enterprise, cloud, and hybrid environments. TransLattice offered a NewSQL database and an application platform, and was responsible for making Postgres-XL open source.

The following outline is provided as an overview of and topical guide to MySQL:

<span class="mw-page-title-main">Postgres-XL</span>

Postgres-XL is a distributed relational database management system (RDBMS) software based on PostgreSQL. It aims to provide feature parity with PostgreSQL while distributing the workload over a cluster. The name "Postgres-XL" stands for "eXtensible Lattice".

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References

  1. "What we talk about when we talk about NewSQL". Archived from the original on 2012-09-05. Retrieved 2011-12-16.
  2. "The NewSQL Movement". Archived from the original on 1 February 2012. Retrieved 2011-12-16.
  3. "Form D: Notice of Sale of Securities". United States Securities and Exchange Commission. July 5, 2007. Archived from the original on April 8, 2016. Retrieved September 5, 2016.
  4. "The Clustrix story". DBMS2 Blog. May 12, 2010. Retrieved September 5, 2016.
  5. Camille Riketts (May 3, 2010). "Y Combinator's Clustrix rolls out databases that scale". Venture Beat. Retrieved September 5, 2016.
  6. 1 2 Stacey Higginbotham (May 3, 2010). "Clustrix Builds the Webscale Holy Grail: A Database That Scales". Gigaom. Retrieved September 5, 2016.
  7. 1 2 Barb Darrow (August 19, 2013). "Clustrix bags $10M more in funding to keep scaling out its SQL database". Gigaom. Retrieved September 5, 2016.
  8. Robin Wauters (October 18, 2011). "Clustrix Lands Former Hewlett-Packard VP Robin Purohit As Its New CEO". Tech Crunch. Retrieved September 5, 2016.
  9. Ryan Lawler (July 5, 2012). "Big Data Startup Clustrix Raises $6.75 Million From Sequoia And Others To Build Scalable Databases". Tech Crunch. Retrieved September 5, 2016.
  10. Barb Darrow (May 6, 2013). "Clustrix nets $16.5M to push its database outside the box". Gigaom. Retrieved September 5, 2016.
  11. "Clustrix Names New CEO Mike Azevedo and Executive Chairman Bruce Armstrong". Wall Street Journal. September 9, 2014. Retrieved September 5, 2016.
  12. "Form D: Notice Exempt Offering of Securities". United States Securities and Exchange Commission. February 12, 2016. Retrieved September 5, 2016.
  13. "MariaDB Acquires Clustrix Adding Distributed Database Technology". February 20, 2018. Retrieved September 20, 2018.
  14. Derrick Harris (January 17, 2011). "Clustrix Lifts the Curtain on Early Database Customers". Gigaom via The New York Times. Retrieved September 5, 2016.
  15. / Google Spanner's most surprising revelation NoSQL is Out and NewSQL is in
  16. James Hamilton (May 5, 2010). "Clustrix Database Appliance" . Retrieved September 5, 2016.
  17. "Clustrix Database Appliance". Company Documentation. Archived from the original on February 2, 2014. Retrieved September 5, 2016.
  18. 1 2 Jon Evans (January 19, 2013). "Your Database Is Probably Terrible". Tech Crunch. Retrieved September 5, 2016.
  19. "Clustrix Announces General Availability of ClustrixDB as a Software Release". Database Trends and Applications. October 31, 2013. Retrieved September 5, 2016.
  20. "10 Companies & Technologies to Watch in 2013 | Inside Analysis". Archived from the original on 2013-03-10. Retrieved 2013-02-21.
  21. "Archived copy" (PDF). Archived from the original (PDF) on 2013-09-29. Retrieved 2013-02-21.{{cite web}}: CS1 maint: archived copy as title (link)
  22. http://cs.brown.edu/courses/cs227/slides/checkpointing/clustrix.pdf%5B%5D
  23. http://cattell.net/datastores/Datastores.pdf [ bare URL PDF ]
  24. Vadim Tkachenko and Rodrigo Gadea (October 20, 2011). "Clustrix tpcc-mysql Benchmark" (PDF). Percona. Archived from the original (PDF) on February 12, 2012. Retrieved September 5, 2016.
  25. Paul Mikesell and Aaron Passey (October 25, 2011). "Opening Keynote: Characterizing Performance". Percona Live London. Retrieved September 5, 2016.
  26. Clustrix Delivers Software-Only Kit to Demo Shard-less MySQL Scaling
  27. https://www.theregister.com/2023/10/13/mariadb_restructure