Wide-column store

Last updated December 06, 2024

A wide-column store (or extensible record store) is a column-oriented DBMS and therefore a special type of NoSQL database.^[1] It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store.^[1] Google's Bigtable is one of the prototypical examples of a wide-column store.^[2]

Wide-column stores versus columnar databases

Wide-column stores such as Bigtable and Apache Cassandra are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout. In genuine column stores, a columnar data layout is adopted such that each column is stored separately on disk. Wide-column stores do often support the notion of column families that are stored separately. However, each such column family typically contains multiple columns that are used together, similar to traditional relational database tables. Within a given column family, all data is stored in a row-by-row fashion, such that the columns for a given row are stored together, rather than each column being stored separately.

Wide-column stores that support column families are also known as column family databases.^{[ citation needed ]}

Notable examples

Notable wide-column stores ^[3] include:

Apache Accumulo
Apache Cassandra
Apache HBase
Bigtable
DataStax Enterprise (uses Apache Cassandra)
DataStax Astra DB (uses Apache Cassandra)
Hypertable
Azure Tables
ScyllaDB

Related Research Articles

In a relational database, a column is a set of data values of a particular type, one value for each row of a table. A column may contain text values, numbers, or even pointers to files in the operating system. Columns typically contain simple types, though some relational database systems allow columns to contain more complex data types, such as whole documents, images, or even video clips. A column can also be called an attribute.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

Data orientation refers to how tabular data is represented in a linear memory model such as in-disk or in-memory.The two most common representations are column-oriented and row-oriented.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

Apache Cassandra is a free and open-source database management system designed to handle large volumes of data across multiple commodity servers. Cassandra supports computer clusters and spans multiple data centers, featuring asynchronous and masterless replication. It enables low-latency operations for all clients and incorporates Amazon's Dynamo distributed storage and replication techniques, combined with Google's Bigtable data storage engine model.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

A keyspace in a NoSQL data store is an object that holds together all column families of a design. It is the outermost grouping of the data in the data store. It resembles the schema concept in Relational database management systems. Generally, there is one keyspace per application.

The standard column family is a NoSQL object that contains columns of related data. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a "table", each key–value pair being a "row". Each column is a tuple consisting of a column name, a value, and a timestamp. In a relational database table, this data would be grouped together within a table with other non-related data.

Within database management systems, the record columnar file or RCFile is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure includes a data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.

LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB source code is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-based systems, macOS, Windows, and Android.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and server-side programming mechanisms. According to DB-Engines ranking, Accumulo is the third most popular NoSQL wide column store behind Apache Cassandra and HBase and the 67th most popular database engine of any type (complete) as of 2018.

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

References

1 2 Wide Column Stores. DB-Engines Encyclopedia.
↑ Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data.
↑ DB-Engines Ranking of Wide Column Stores.

This computing article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[db-engines-1] 1 2 Wide Column Stores. DB-Engines Encyclopedia.

[2] Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data.

[3] DB-Engines Ranking of Wide Column Stores.

[1]

[2]

[3]

v t e Database management systems
Types	Object-oriented comparison Relational list comparison Key–value Column-oriented list Document-oriented Wide-column store Graph NoSQL NewSQL In-memory list Multi-model comparison Cloud Blockchain-based database
Concepts	Database ACID Armstrong's axioms Codd's 12 rules CAP theorem CRUD Null Candidate key Foreign key PACELC theorem Superkey Surrogate key Unique key
Objects	Relation table column row View Transaction Transaction log Trigger Index Stored procedure Cursor Partition
Components	Concurrency control Data dictionary JDBC XQJ ODBC Query language Query optimizer Query rewriting system Query plan
Functions	Administration Query optimization Replication Sharding
Related topics	Database models Database normalization Database storage Distributed database Federated database system Referential integrity Relational algebra Relational calculus Relational model Object–relational database Transaction processing
Category Outline

Wide-column store

Contents