ClickHouse

Last updated
Clickhouse
Developer(s) ClickHouse, Inc.
Initial releaseJune 15, 2016;7 years ago (2016-06-15)
Stable release
v24.4.1.2088-stable / May 1, 2024;3 days ago (2024-05-01) [1]
Repository github.com/ClickHouse/ClickHouse/
Written in C++
Operating system Linux, FreeBSD, macOS
License Apache License 2.0
Website clickhouse.com

ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the San Francisco Bay Area with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

Contents

In September 2021 in San Francisco, CA, ClickHouse incorporated to house the open source technology with an initial $50 million investment from Index Ventures and Benchmark Capital with participation by Yandex N.V. [2] and others. On October 28, 2021 the company received Series B funding totaling $250 million at a valuation of $2 billion from Coatue Management, Altimeter Capital, and other investors. The company continues to build the open source project and engineering cloud technology.

History

ClickHouse’s technology was first developed over 10 years ago at Yandex, Russia's largest technology company. [3] In 2009, Alexey Milovidov and developers started an experimental project to check the hypothesis if it was viable to generate analytical reports in real-time from non-aggregated data that is also constantly added in real-time. The developers spent 3 years to prove this hypothesis, and in 2012 ClickHouse launched in production for the first time to power Yandex.Metrica, the second-largest web analytics platform in the world, after Google Analytics.

Unlike custom data structures used before, ClickHouse was applicable more generally to work as a database management system. The power and utility of ClickHouse offered a true column-oriented DBMS, it allowed for systems to generate reports from petabytes of raw data with sub-second latencies. ClickHouse was widely adopted at Yandex including for Yandex.Tank load testing tool and Yandex.Market to monitor site accessibility and KPIs.

In 2016, the ClickHouse project was released as open-source software under the Apache 2 license in June 2016 to power analytical use cases around the globe. The systems at the time offered a server throughput of a hundred thousand rows per second, ClickHouse out performed that speed with a throughput of hundreds of millions of rows per second.

Since ClickHouse became available as open source in 2016, its popularity has grown exponentially, as evidenced through adoption by industry-leading companies like Uber, Comcast, eBay, and Cisco. [4] ClickHouse was also implemented at CERN's LHCb experiment to store and process metadata on 10 billion events with over 1000 attributes per event.

Features

The main features of the ClickHouse DBMS are: [5]

Limitations

ClickHouse has some features that can be considered disadvantages:

Use cases

ClickHouse was designed for OLAP queries. [5]

One of the common cases for ClickHouse is server log analysis. After setting regular data uploads to ClickHouse (it's recommended to insert data in fairly large batches with more than 1000 rows), it's possible to analyze incidents with instant queries or monitor a service's metrics, such as error rates, response times, and so on.

ClickHouse can also be used as an internal data warehouse for in-house analysts. ClickHouse can store data from different systems (such as Hadoop or certain logs) and analysts can build internal dashboards with the data or perform real-time analysis for business purposes.

Benchmark results

According to benchmark tests conducted by its developers, [6] for OLAP queries ClickHouse is more than 100 times faster than Hive (a DBMS based on the Hadoop technology stack) or MySQL (a common RDBMS).

See also

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The product name was originally styled as DB2 until 2017, when it changed to its present form.

In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

In computing, online analytical processing, or OLAP, is an approach to quickly answer multi-dimensional analytical (MDA) queries. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

<span class="mw-page-title-main">Virtuoso Universal Server</span> Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns, and more options for data compression. However, they are typically less efficient for inserting new data.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

<span class="mw-page-title-main">Netezza</span> Provider of Integrated Data Warehouse Hardware and Software

IBM Netezza is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for the most demanding analytic uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.

<span class="mw-page-title-main">Exasol</span> Database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information.

Java Database Connectivity (JDBC) is an application programming interface (API) for the Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. It is part of the Java Standard Edition platform, from Oracle Corporation. It provides methods to query and update data in a database, and is oriented toward relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-accessible data source in the Java virtual machine (JVM) host environment.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">SingleStore</span> Database management system

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

<span class="mw-page-title-main">SAP HANA</span> Database management system by SAP

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

<span class="mw-page-title-main">Apache Pinot</span> Open-source distributed data store

Apache Pinot is a column-oriented, open-source, distributed data store written in Java. Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion. The name Pinot comes from the Pinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.

DuckDB is an open-source column-oriented relational database management system (RDBMS) originally developed by Mark Raasveldt and Hannes Mühleisen at the Centrum Wiskunde & Informatica (CWI) in the Netherlands and first released in 2019. Those behind the project say it has millions of downloads per month. It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases DuckDB is not focusing on transactional (OLTP) applications and instead is specialized for online analytical processing (OLAP) workloads.

YDB is a distributed SQL database management system (DBMS) developed by Yandex, available as open-source technology.

References

  1. "v24.4.1.2088-stable". Github. Retrieved 4 May 2024.
  2. "ClickHouse Raises $250M Series B to Scale Groundbreaking OLAP Database Management System Globally". 28 October 2021.
  3. "Yandex, Russia's biggest technology company, celebrates 20 years". The Economist. 30 September 2017.
  4. Lardinois, Frederic (2022-12-06). "ClickHouse launches ClickHouse Cloud, extends its Series B". TechCrunch. Retrieved 2023-07-24.
  5. 1 2 "ClickHouse Guide". clickhouse.yandex. Retrieved 2016-11-10.
  6. 1 2 "Performance comparison of analytical DBMS". clickhouse.yandex. Retrieved 2016-11-10.
  7. "smi2/phpClickHouse". GitHub. Retrieved 2016-11-10.
  8. "apla/node-clickhouse". GitHub. Retrieved 2016-11-10.
  9. "elcamlost/perl-DBD-ClickHouse". GitHub. Retrieved 2016-11-10.
  10. "archan937/clickhouse". GitHub. Retrieved 2016-11-10.
  11. "hannesmuehleisen/clickhouse-r". GitHub. Retrieved 2016-11-10.
  12. "ClickHouse/clickhouse-odbc". GitHub. 13 December 2021.
  13. "ClickHouse/clickhouse-jdbc". GitHub. 11 December 2021.