Column-oriented DBMS

Last updated

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to read columns that are not relevant), and more options for data compression. However, they are typically less efficient for inserting new data.

Contents

Practical use of a column store versus a row store differs little in the relational DBMS world. Both columnar and row databases can use traditional database query languages like SQL to load data and perform queries. Both row and columnar databases can become the backbone in a system to serve data for common extract, transform, load (ETL) and tools.

Description

Background

A relational database management system provides data that represents a two-dimensional table of columns and rows. For example, a database might have this table:

RowIdEmpIdLastnameFirstnameSalary
00110SmithJoe60000
00212JonesMary80000
00311JohnsonCathy94000
00422JonesBob55000

This simple table includes an employee identifier (EmpId), name fields (Lastname and Firstname) and a salary (Salary). This two-dimensional format is an abstraction. In an actual implementation, storage hardware requires the data to be serialized into one form or another.

The most expensive operations involving hard disks are seeks. In order to improve overall performance, related data should be stored in a fashion to minimize the number of seeks. This is known as locality of reference, and the basic concept appears in a number of different contexts. Hard disks are organized into a series of blocks of a fixed size, typically enough to store several rows of the table. By organizing the table's data so rows fit within these blocks, and grouping related rows onto sequential blocks, the number of blocks that need to be read or sought is minimized in many cases, along with the number of seeks.

A survey by Pinnecke et al. [1] covers techniques for column-/row hybridization as of 2017.

Row-oriented systems

A common method of storing a table is to serialize each row of data, like this:

001:10,Smith,Joe,60000; 002:12,Jones,Mary,80000; 003:11,Johnson,Cathy,94000; 004:22,Jones,Bob,55000;

As data is inserted into the table, it is assigned an internal ID, the rowid that is used internally in the system to refer to data. In this case the records have sequential rowids independent of the user-assigned empid. In this example, the DBMS uses short integers to store rowids. In practice, larger numbers, 64-bit or 128-bit, are normally used.

Row-oriented systems are designed to efficiently return data for an entire row, or record, in as few operations as possible. This matches the common use-case where the system is attempting to retrieve information about a particular object, say the contact information for a user in a rolodex system, or product information for an online shopping system. By storing the record's data in a single block on the disk, along with related records, the system can quickly retrieve records with a minimum of disk operations.

Row-oriented systems are not efficient at performing set-wide operations on the whole table, as opposed to a small number of specific records. For instance, in order to find all records in the example table with salaries between 40,000 and 50,000, the DBMS would have to fully scan through the entire table looking for matching records. While the example table shown above will likely fit in a single disk block, a table with even a few hundred rows would not, and multiple disk operations would be needed to retrieve the data and examine it.

To improve the performance of these sorts of operations (which are very common, and generally the point of using a DBMS), most DBMSs support the use of database indexes, which store all the values from a set of columns along with rowid pointers back into the original table. An index on the salary column would look something like this:

55000:004;  60000:001; 80000:002; 94000:003;

As they store only single pieces of data, rather than entire rows, indexes are generally much smaller than the main table stores. Scanning this smaller set of data reduces the number of disk operations. If the index is heavily used, it can dramatically reduce the time for common operations. However, maintaining indexes adds overhead to the system, especially when new data is written to the database. Records not only need to be stored in the main table, but any attached indexes have to be updated as well.

The main reason why indexes dramatically improve performance on large datasets is that database indexes on one or more columns are typically sorted by value, which makes range queries operations (like the above "find all records with salaries between 40,000 and 50,000" example) very fast (lower time-complexity).

A number of row-oriented databases are designed to fit entirely in RAM, an in-memory database. These systems do not depend on disk operations, and have equal-time access to the entire dataset. This reduces the need for indexes, as it requires the same amount of operations to fully scan the original data as a complete index for typical aggregation purposes. Such systems may be therefore simpler and smaller, but can only manage databases that will fit in memory.

Column-oriented systems

A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on. For our example table, the data would be stored in this fashion:

001:10,002:12,003:11,004:22; 001:Smith,002:Jones,003:Johnson,004:Jones, 001:Joe,002:Mary,003:Cathy,004:Bob; 001:60000,002:80000,003:94000,004:55000;

In this layout, any one of the columns more closely matches the structure of an index in a row-oriented system. This may cause confusion that can lead to the mistaken belief a column-oriented store "is really just" a row-store with an index on every column. However, it is the mapping of the data that differs dramatically. In a row-oriented system, indices map column values to rowids, whereas in a column-oriented system, columns map rowids to column values. [2] This may seem subtle, but the difference can be seen in this common modification to the same store wherein the two "Jones" items, above, are compressed into a single item with two rowids:

…;Smith:001;Jones:002,004;Johnson:003;…

Whether or not a column-oriented system will be more efficient in operation depends heavily on the workload being automated. Operations that retrieve all the data for a given object (the entire row) are slower. A row-oriented system can retrieve the row in a single disk read, whereas numerous disk operations to collect data from multiple columns are required from a columnar database. However, these whole-row operations are generally rare. In the majority of cases, only a limited subset of data is retrieved. In a rolodex application, for instance, collecting the first and last names from many rows to build a list of contacts is far more common than reading all data for any single address. This is even more true for writing data into the database, especially if the data tends to be "sparse" with many optional columns. For this reason, column stores have demonstrated excellent real-world performance in spite of many theoretical disadvantages. [3]

Partitioning, indexing, caching, views, OLAP cubes, and transactional systems such as write-ahead logging or multiversion concurrency control all dramatically affect the physical organization of either system. That said, online transaction processing (OLTP)-focused RDBMS systems are more row-oriented, while online analytical processing (OLAP)-focused systems are a balance of row-oriented and column-oriented.

Benefits

Access time

Comparisons between row-oriented and column-oriented databases are typically concerned with the efficiency of hard-disk access for a given workload, as seek time is incredibly long compared to the other bottlenecks in computers. For example, a typical Serial ATA (SATA) hard drive has an average seek time of between 16 and 22 milliseconds [4] while DRAM access on an Intel Core i7 processor takes on average 60 nanoseconds, nearly 400,000 times as fast. [5] Clearly, disk access is a major bottleneck in handling big data. Columnar databases boost performance by reducing the amount of data that needs to be read from disk, both by efficiently compressing the similar columnar data and by reading only the data necessary to answer the query.

In practice, columnar databases are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve highly complex queries over all data (possibly petabytes). However, some work must be done to write data into a columnar database. Transactions (INSERTs) must be separated into columns and compressed as they are stored, making it less suited for OLTP workloads. Row-oriented databases are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions. For example, retrieving all data from a single row is more efficient when that data is located in a single location (minimizing disk seeks), as in row-oriented architectures. However, column-oriented systems have been developed as hybrids capable of both OLTP and OLAP operations. Some of the OLTP constraints, faced by such column-oriented systems, are mediated using (amongst other qualities) in-memory data storage. [6] Column-oriented systems suitable for both OLAP and OLTP roles effectively reduce the total data footprint by removing the need for separate systems. [7]

Compression

Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data. For example, many popular modern compression schemes, such as LZW or run-length encoding, make use of the similarity of adjacent data to compress. Missing values and repeated values, common in clinical data, can be represented by a two-bit marker. [8] While the same techniques may be used on row-oriented data, a typical implementation will achieve less effective results. [9] [10]

To improve compression, sorting rows can also help. For example, using bitmap indexes, sorting can improve compression by an order of magnitude. [11] To maximize the compression benefits of the lexicographical order with respect to run-length encoding, it is best to use low-cardinality columns as the first sort keys. [12] For example, given a table with columns sex, age, name, it would be best to sort first on the value sex (cardinality of two), then age (cardinality of <128), then name.

Columnar compression achieves a reduction in disk space at the expense of efficiency of retrieval. The greater adjacent compression achieved, the more difficult random-access may become, as data might need to be uncompressed to be read. Therefore, column-oriented architectures are sometimes enriched by additional mechanisms aimed at minimizing the need for access to compressed data. [13]

History

Column stores or transposed files have been implemented from the early days of DBMS development. TAXIR was the first application of a column-oriented database storage system with focus on information-retrieval in biology [14] in 1969. Clinical data from patient records with many more attributes than could be analyzed were processed in 1975 and after by a time-oriented database system (TODS). [8] Statistics Canada implemented the RAPID system [15] in 1976 and used it for processing and retrieval of the Canadian Census of Population and Housing as well as several other statistical applications. RAPID was shared with other statistical organizations throughout the world and used widely in the 1980s. It continued to be used by Statistics Canada until the 1990s.

Another column-oriented database was SCSS. [16] [17] [18]

Later column-oriented database packages included:

Since about 2004 there have been additional open source and commercial implementations. MonetDB was released under an open-source license on September 30, 2004, [19] followed closely by the now defunct C-Store. [20]

C-store was a university project that eventually, with team member Michael Stonebraker staying on, led to Vertica, which he co-founded in 2005. [21] [22]

The MonetDB-related X100 project evolved into VectorWise. [23] [24] Druid is a column-oriented data store that was open-sourced in late 2012 and is now used by numerous organizations. [25]

Classic Relational DBMS can use column-oriented strategies by mixing row-oriented and column-oriented tables. Despite the DBMS complexity, this approach has proven to be valuable from the years 2010 to present. For example in 2014 Citusdata introduced column-oriented tables for PostgreSQL [26] and McObject added support for columnar storage with its release of eXtremeDB Financial Edition in 2012 [27]  which was then used to establish a new standard of performance for the independently audited STAC-M3 benchmark. [28]

See also

Related Research Articles

<span class="mw-page-title-main">Data warehouse</span> Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

In a relational database, a column is a set of data values of a particular type, one value for each row of the database. A column may contain text values, numbers, or even pointers to files in the operating system. Columns typically contain simple types, though some relational database systems allow columns to contain more complex data types, such as whole documents, images, or even video clips. A column can also be called an attribute.

<span class="mw-page-title-main">MonetDB</span> Open source column-oriented relational database management system

MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.

A bitmap index is a special kind of database index that uses bitmaps.

Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to that such systems are expected to respond to user requests and process them in real-time. The term is contrasted with online analytical processing (OLAP) which instead focuses on data analysis.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

Infobright is a commercial provider of column-oriented relational database software with a focus in machine-generated data. The company's head office is located in Toronto, Ontario, Canada. Most of its research and development is based in Warsaw, Poland. Support personnel are located in various offices around the world.

Within database management systems, the RCFile is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure includes a data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">SingleStore</span> Database management system

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">Log-structured merge-tree</span> Data structure

In computer science, the log-structured merge-tree is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.

<span class="mw-page-title-main">SAP HANA</span> Database management system by SAP

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

A wide-column store is a column-oriented DBMS and therefore a special type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store.

<span class="mw-page-title-main">ClickHouse</span> Open-source database management system

ClickHouse is an open-source column-oriented DBMS for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the San Francisco Bay Area with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

References

  1. Marcus Pinnecke; David Broneske; Gabriel Campero Durand; Gunter Saake (2017). Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine's Perspective (PDF). IEEE 33rd International Conference on Data Engineering (ICDE). doi:10.1109/ICDE.2017.237.
  2. Daniel Abadi; Samuel Madden (31 July 2008). "Debunking Another Myth: Column-Stores vs. Vertical Partitioning". The Database Column. Archived from the original on December 4, 2008.
  3. Stavros Harizopoulos; Daniel Abadi; Peter Boncz. "Column-Oriented Database Systems" (PDF). VLDB 2009 Tutorial. p. 5.
  4. Masiero, Manuel (January 8, 2013). "Western Digital's 4 TB WD4001FAEX Review: Back In Black". Tom's Hardware.
  5. Levinthal, David (2009). "Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors" (PDF). Intel. p. 22. Retrieved 2017-11-10.
  6. "Compacting Transactional Data in Hybrid OLTP&OLAP Databases" (PDF). Retrieved August 1, 2017.
  7. "A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database" (PDF). Retrieved August 1, 2017.
  8. 1 2 Stephen Weyl; James F. Fries; Gio Wiederhold; Frank Germano (1975). "A Modular Self-describing Clinical Database System". Computers and Biomedical Research. 8 (3): 279–293. doi:10.1016/0010-4809(75)90045-2. PMID   1157469.
  9. D. J. Abadi; S. R. Madden; N. Hachem (2008). Column-stores vs. row-stores: how different are they really?. pp. 967–980.{{cite book}}: |work= ignored (help)
  10. Bruno, N (2009). "Teaching an old elephant new tricks". arXiv: 0909.1758 [cs.DB].
  11. Daniel Lemire, Owen Kaser, Kamel Aouiche, "Sorting improves word-aligned bitmap indexes", Data & Knowledge Engineering, Volume 69, Issue 1 (2010), pp. 3-28.
  12. Daniel Lemire and Owen Kaser, Reordering Columns for Smaller Indexes, Information Sciences 181 (12), 2011
  13. Dominik Ślęzak; Jakub Wróblewski; Victoria Eastwood; Piotr Synak (2008). Brighthouse: an analytic data warehouse for ad hoc queries (PDF). Proceedings of the 34th VLDB Conference. Auckland, New Zealand. Archived from the original (PDF) on 2016-05-07. Retrieved 2009-05-04.
  14. George F. Estabrook; Robert C. Brill (November 1969). "The theory of the TAXIR accessioner". Mathematical Biosciences. 5 (3–4): 327–340. doi:10.1016/0025-5564(69)90050-9.
  15. "A DBMS for large statistical databases". acm.org. Vldb '79. 1979. pp. 319–327.
  16. already on the market by September 1977
  17. Nie, Norman H. (1980). SCSS: A User's Guide to the SPSS Conversational Statistical System. McGraw-Hill. ISBN   978-0070465336.
  18. "SCSS from SPSS, Inc". ComputerWorld. September 26, 1977. p. 28.
  19. "A short history about us". monetdb.org.
  20. "C-Store". mit.edu. Archived from the original on 2012-03-05. Retrieved 2008-01-22.
  21. "The Vertica Analytic Database: C-Store 7 Years Later" (PDF)" (PDF). VLDB.org. August 28, 2012.
  22. Charles Babcock (February 21, 2008). "Database Pioneer Rethinks The Best Way To Organize Data". InformationWeek . Retrieved 2018-12-08.
  23. Marcin Zukowski; Peter Boncz (May 20, 2012). "From x100 to vectorwise". Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM. pp. 861–862. doi:10.1145/2213836.2213967. ISBN   978-1-4503-1247-9. S2CID   9187072.
  24. D. Inkster; M. Zukowski; P.A. Boncz (September 20, 2011). "Integration of VectorWise with Ingres". ACM SIGMOD Record. 40 (3): 45. CiteSeerX   10.1.1.297.4985 . doi:10.1145/2070736.2070747. S2CID   6372175.
  25. "Druid". druid.io.
  26. "Citusdata". github.com.
  27. Saujani, Sandeep (19 June 2012). "McObject eXtremeDB Financial Edition In-Memory DBMS Breaks Through Capital Markets' Data Management Bottleneck". bobs guide.
  28. STAC Benchmark Council, Leadership (3 November 2012). "McObject eXtremeDB 5.0 Financial Edition with Kove XPD L2 Storage System, Dell PowerEdge R910 Server and Mellanox ConnectX-2 and MIS5025Q QDR InfiniBand Switch". STAC.