SingleStore

Last updated
SingleStore
Genre RDBMS
FoundedJanuary 2011 (2011-01)
Founders
  • Eric Frenkiel
  • Nikita Shamgunov
  • Adam Prout
Headquarters
Area served
Worldwide
Number of employees
350 [1]
Website www.singlestore.com

SingleStore (formerly MemSQL) is a proprietary, cloud-native database designed for data-intensive applications. [2] A distributed, relational, SQL database management system [3] (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing. [4] [2]

Contents

SingleStore primarily stores relational data, though it can also store JSON data, graph data, and time series data. It supports blended workloads, commonly referred to as HTAP workloads, as well as more traditional OLTP and OLAP use cases. For queries, it compiles Structured Query Language (SQL) into machine code. The SingleStore database engine can be run in various Linux environments, including on-premises installations, public and private cloud providers, in containers via a Kubernetes operator, or as a hosted service in the cloud known as SingleStore Managed Service. [5] [6]

History


On April 23, 2013, SingleStore launched its first generally available version of the database to the public as MemSQL. [7] Early versions only supported row-oriented tables, and were highly optimized for cases where all data can fit within main memory. This design was based on the idea that the cost of RAM would continue to decrease exponentially over time, in a trend similar to Moore's law. This would eventually allow most use cases for database systems to store their data exclusively in memory.

Shortly after launch, MemSQL added general support for an on-disk column-based storage format to work alongside the in-memory rowstore. [8] The decreases in cost of memory slowed over time, and the market for purely in-memory database systems largely failed to materialize, with increasing demand for disk-based OLAP workloads. Thus, over time, MemSQL's columnstore became a major focus and a crucial feature for customers.

On October 27, 2020, MemSQL rebranded to SingleStore to reflect a shift in focus away from exclusively in-memory workloads. The new name highlights the goal of achieving a universal storage format capable of supporting both transactional and analytical use cases. [9]

In its current product release, v.7.5, SingleStore became the first and only database to combine separation of storage and compute plus system of record into a single platform. Headquartered in San Francisco, California, in June 2021 SingleStore opened an office in Raleigh, North Carolina. As part of the office opening, SingleStore launched Launch Pad, a center for innovation to incubate and prototype solutions. Its other offices include Sunnyvale, California, Seattle, Washington, and Lisbon, Portugal. [10]

Funding

In January 2013, SingleStore announced it raised $5 million. Since then, the company has raised $318.1M from various investors including Khosla Ventures, Accel, Google Ventures, Dell Capital and HPE, among others. [11]

Funding Rounds
SeriesDateAmount (million $)Lead Investors
A20135 DVCA, IA Ventures
B201435 [5] Accel
C201636 [5] Caffeinated Capital, REV
D201830 [3] Google Ventures, Glynn Capital
EDec. 202080 [12] Insight Partners
FSept. 202180 [4] Insight Partners
GJuly 2022116 [13] Goldman Sachs Asset Management

Architecture

Row and column table formats

SingleStore can store data in either row-oriented tables ("rowstores") or column-oriented tables ("columnstores"). The format used is determined by the user when creating the table.

Rowstore tables, as the name implies, store information in row format, which is the traditional data format used by RDBMS systems. Rowstores are optimized for singleton or small insert, update or delete queries and are most closely associated with OLTP (transactional) use cases. Data for rowstore tables is stored completely in-memory, making random reads fast, with snapshots and transaction logs persisted to disk.

Columnstores are optimized for complex SELECT queries, typically associated with OLAP (analytics) and data warehousing use cases. As an example, a large clinical data set for data analysis is best stored in columnar format, since queries run against it will typically be ad hoc queries where aggregates are computed over large numbers of similar data items. Data for columnstore tables is stored on-disk, supporting fast sequential reads and compression that typically reaches 5-10x.

Indexing

Rather than the traditional B-tree index, SingleStore rowstores use skiplists optimized for fast, lock-free processing in memory. Columnstores store data indexed in sorted segments, in order to maximize on-disk compression and achieve fast ordered scans. SingleStore also supports using hash indexes as secondary indexes to speed up certain queries.

Distributed architecture

A SingleStore database is distributed across many commodity machines. Data is stored in partitions on leaf nodes, and users connect to aggregator nodes.[ citation needed ] A single piece of software is installed for SingleStore aggregator and leaf nodes; administrators designate each machine’s role in the cluster during setup. An aggregator node is responsible for receiving SQL queries, breaking them up across leaf nodes, and aggregating results back to the client. A leaf node stores SingleStore data and processes queries from the aggregator(s). All communication between aggregators and leaf nodes is done over the network using SQL. SingleStore uses hash partitioning to distribute data uniformly across the number of leaf nodes. [14]

Real-time streaming data ingestion

SingleStore Pipelines is an integration technology built-in which provides streaming data ingestion in parallel from distributed data sources. [5] It provides live de-duplication as data is ingested, exactly once semantics from message brokers, and simplifies architectures by reducing or eliminating the need to ETL middleware. Transformation and ML integration can be done via SingleStore Pipeline Transforms by embedding a binary. SingleStore Pipelines connect to data sources such as Apache Kafka, Apache Spark, Amazon S3 buckets, Microsoft Azure Blob Storage Google Cloud Storage, HDFS, or files on disk and supports formats such as JSON, Parquet, Avro, and CSV. Because of the lock-free skip lists, queries can retrieve the data as soon as it lands, but are not blocked from continuing while data is ingested. [1] [15]

Bottomless storage

Bottomless storage separates storage and compute for SingleStore. [16] Data files persist to S3 or comparable blob storage and NFS, asynchronously. The “blobs” are the compressed, encoded data structures that back the columnstore. High availability is maintained in the SingleStore cluster for the most recent data but long-term storage moves to blob storage. Blobs that are not queried are automatically deleted from SingleStore node’s local disk, allowing the cluster to hold more data than available disk, making the cluster’s storage “bottomless.” New replicas do not need to download all blob files to come online, creating and moving partitions. Bottomless acts as a “continuous backup” that obviates the need for traditional disaster recovery and backup cloud-operation procedures. It also supports larger petabyte-sized datasets for historical analytics. [5]

Durability

Durability for the in-memory rowstore is implemented with a write-ahead log and snapshots, similar to checkpoints. With default settings, as soon as a transaction is acknowledged in memory, the database will asynchronously write the transaction to disk as fast as the disk allows. [17]

The on-disk columnstore is actually fronted by an in-memory rowstore-like structure, indexed using a skiplist. This structure has the same durability guarantees as the SingleStore rowstore. Apart from that, the columnstore is durable, since its data is stored on disk.

Replication

A SingleStore cluster can be configured in "High Availability" (HA) mode, where every data partition is automatically created with master and slave versions on two separate leaf nodes. In HA mode, aggregators send transactions to the master partitions, which then send logs to the slave partitions. In the event of an unexpected master failure, the slave partitions take over as master partitions, in a fully online operation with no downtime. [5]

Distribution formats

SingleStore San Francisco office in 2020 MemSQL San Francisco office, October 2020.JPG
SingleStore San Francisco office in 2020

SingleStore can be downloaded for free and run on Linux for systems up to 4 leaf nodes of 32 gigs RAM each; an Enterprise license is required for larger deployments and for official SingleStore support. SingleStore clusters can be managed in containers using the SingleStore Kubernetes Operator. SingleStore is also available as a managed service named SingleStore Managed Service, available in various regions in Google Cloud and Amazon Web Services, with a Microsoft Azure implementation promised for the near future. The underlying engine and potential system performance are identical in all distribution formats. [1]

SingleStore ships with a set of installation, management, and monitoring tools called SingleStore Tools. When installing SingleStore, Tools can be used to set up the distributed SingleStore database across machines. SingleStore also provides a browser-based query and management UI called SingleStore Studio, which provides query processing and database monitoring, and shows health and informational details about the running cluster. [1]

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

A database transaction symbolizes a unit of work, performed within a database management system against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure. For example: when execution prematurely and unexpectedly stops in which case many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.

SAP ASE (Adaptive Server Enterprise), originally known as Sybase SQL Server, and also commonly known as Sybase DB or Sybase ASE, is a relational model database server developed by Sybase Corporation, which later became part of SAP SE. ASE was developed for the Unix operating system, and is also available for Microsoft Windows.

MySQL Cluster, also known as MySQL Ndb Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

pureXML is the native XML storage feature in the IBM Db2 data server. pureXML provides query languages, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that Db2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format.

Database virtualization is the decoupling of the database layer, which lies between the storage and application layers within the application stack. Virtualization of the database layer enables a shift away from the physical, toward the logical or virtual.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

The following is provided as an overview of and topical guide to databases:

rasdaman Database management system

rasdaman is an Array DBMS, that is: a Database Management System which adds capabilities for storage and retrieval of massive multi-dimensional arrays, such as sensor, image, simulation, and statistics data. A frequently used synonym to arrays is raster data, such as in 2-D raster graphics; this actually has motivated the name rasdaman. However, rasdaman has no limitation in the number of dimensions - it can serve, for example, 1-D measurement data, 2-D satellite imagery, 3-D x/y/t image time series and x/y/z exploration data, 4-D ocean and climate data, and even beyond spatio-temporal dimensions.

<span class="mw-page-title-main">Array DBMS</span> System that provides database services specifically for arrays

An array database management system or array DBMS provides database services specifically for arrays, that is: homogeneous collections of data items, sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">Apache Ignite</span>

Apache Ignite is a distributed database management system for high-performance computing.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References

  1. 1 2 3 4 "Why Is Better Data Management Silicon Valley's New Obsession?". Inno & Tech Today. 18 April 2022. Retrieved 26 April 2022.
  2. 1 2 "Enterprise Technology: Revenge of the Nerdiest Nerds". Business Week. Archived from the original on July 1, 2012. Retrieved 26 April 2022.
  3. 1 2 "IBM invests in SingleStore to get faster AI and analytics on distributed data" . Retrieved 2017-09-29.
  4. 1 2 Lunden, Ingrid (8 September 2021). "Real-time database platform SingleStore raises $80M more, now at a $940M valuation". TechCrunch. Retrieved 8 September 2021.
  5. 1 2 3 4 5 6 "BOTTOMLESS STORAGE AND PIPELINE: THE QUEST FOR A NEW DATABASE PARADIGM". Dataconomy. 20 April 2022. Retrieved 26 April 2022.
  6. "Database Firm SingleStore Scores $80M in Series F Funding". Datanami. 10 September 2021. Retrieved 26 April 2022.
  7. Hainzinger, Brittany (2020). "MemSQL Is Now SingleStore" (published 2020-11-02). Retrieved 2022-04-23.
  8. "SingleStore raises $80M for distributed SQL database". TechTarget. Retrieved 26 April 2022.
  9. "MemSQL rebrands as SingleStore". Software Development Times. 30 October 2020. Retrieved 26 April 2022.
  10. "SingleStore Could Double Employee Count in Raleigh". News Observer. Retrieved 26 April 2022.
  11. "Database Startup SingleStore Raises $75M". VentureBeat. 8 September 2021. Retrieved 26 April 2022.
  12. "SingleStore, formerly MemSQL, raises $80M to integrate and leverage companies' disparate data silos". TechCrunch. 8 December 2020. Retrieved 27 April 2022.
  13. "SingleStore helps enterprises better manage growing data volumes". VentureBeat. 12 July 2022. Retrieved 26 July 2022.
  14. "Introduction to MemSQL | DBMS 2 : DataBase Management System Services". DBMS. Retrieved 26 April 2022.
  15. "What's Changed: 2021 Gartner Magic Quadrant for Cloud Database Management Systems". Solutions Review. 13 January 2022. Retrieved 26 April 2022.
  16. "Why We Need Management And Scalability To Benefit From The Power Of Data". Forbes. Retrieved 26 April 2022.
  17. "A blazingly fast database in a data-driven world". IBM. Retrieved 2018-01-19.