ClickHouse

Last updated
Clickhouse
Developer(s) ClickHouse, Inc.
Initial releaseJune 15, 2016;8 years ago (2016-06-15)
Stable release
v24.12.1.1614-stable / December 19, 2024;1 day ago (2024-12-19) [1]
Repository github.com/ClickHouse/ClickHouse/
Written in C++
Operating system Linux, FreeBSD, macOS
License Apache License 2.0
Website clickhouse.com

ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the San Francisco Bay Area with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

Contents

In September 2021 in San Francisco, CA, ClickHouse incorporated to house the open source technology with an initial $50 million investment from Index Ventures and Benchmark Capital with participation by Yandex N.V. [2] and others. On October 28, 2021 the company received Series B funding totaling $250 million at a valuation of $2 billion from Coatue Management, Altimeter Capital, and other investors. The company continues to build the open source project and engineering cloud technology.

History

ClickHouse’s technology was first developed over 10 years ago at Yandex, Russia's largest technology company. [3] In 2009, Alexey Milovidov and developers started an experimental project to check the hypothesis if it was viable to generate analytical reports in real-time from non-aggregated data that is also constantly added in real-time. The developers spent 3 years to prove this hypothesis, and in 2012 ClickHouse launched in production for the first time to power Yandex.Metrica.

Unlike custom data structures used before, ClickHouse was applicable more generally to work as a database management system. The power and utility of ClickHouse offered a true column-oriented DBMS, it allowed for systems to generate reports from petabytes of raw data with sub-second latencies. ClickHouse was widely adopted at Yandex including for Yandex.Tank load testing tool and Yandex.Market to monitor site accessibility and KPIs.

In 2016, the ClickHouse project was released as open-source software under the Apache 2 license in June 2016 to power analytical use cases around the globe. The systems at the time offered a server throughput of a hundred thousand rows per second, ClickHouse outperformed them with a throughput of hundreds of millions of rows per second[ citation needed ].

Since ClickHouse became available as open source in 2016, its popularity has grown exponentially, as evidenced through adoption by industry-leading companies like Uber, Comcast, eBay, and Cisco. [4] ClickHouse was also implemented at CERN's LHCb experiment to store and process metadata on 10 billion events with over 1000 attributes per event.

Features

The main features of the ClickHouse DBMS are: [5]

Limitations

ClickHouse has some features that can be considered disadvantages:

Use cases

ClickHouse was designed for OLAP queries. [5] ClickHouse performs well when:

For simple queries, latencies of 50 ms are typical.

One of the common cases for ClickHouse is server log analysis. After setting regular data uploads to ClickHouse (it's recommended to insert data in fairly large batches with more than 1000 rows), it's possible to analyze incidents with instant queries or monitor a service's metrics, such as error rates, response times, and so on.

ClickHouse can also be used as an internal data warehouse for in-house analysts. ClickHouse can store data from different systems (such as Hadoop or certain logs) and analysts can build internal dashboards with the data or perform real-time analysis for business purposes.

Benchmark results

According to benchmark tests conducted by its developers, [6] for OLAP queries ClickHouse is more than 100 times faster than Hive (a DBMS based on the Hadoop technology stack) or MySQL (a common RDBMS).

See also

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

In computing, online analytical processing, or OLAP, is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

<span class="mw-page-title-main">Virtuoso Universal Server</span> Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

Simba Technologies Inc. is a software company based in Vancouver, British Columbia, Canada. Simba specializes in products for ODBC, JDBC, OLE DB for OLAP (ODBO) and XML for Analysis (XMLA). The company licenses data connectivity technologies, and provides software development for Microsoft Windows, Linux, UNIX, Mac and mobile device platforms. Simba Technologies was founded as PageAhead Software in Vancouver and Seattle, Washington in 1991 and changed its name in 1995. Customers include Microsoft, Oracle Corporation, MIS AG, SAP AG and Descisys.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Netezza</span> Provider of Integrated Data Warehouse Hardware and Software

IBM Netezza is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for the most demanding analytic uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.

<span class="mw-page-title-main">Exasol</span> German database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information.

Java Database Connectivity (JDBC) is an application programming interface (API) for the Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. It is part of the Java Standard Edition platform, from Oracle Corporation. It provides methods to query and update data in a database, and is oriented toward relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-accessible data source in the Java virtual machine (JVM) host environment.

eXtremeDB is a high-performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/C++ based programs. It runs on Windows, Linux, and other real-time and embedded operating systems.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

<span class="mw-page-title-main">Apache Pinot</span> Open-source distributed data store

Apache Pinot is a column-oriented, open-source, distributed data store written in Java. Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion. The name Pinot comes from the Pinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.

YDB is a distributed SQL database management system (DBMS) developed by Yandex, available as open-source technology.

References

  1. "v24.12.1.1614-stable". Github. Retrieved 20 December 2024.
  2. "ClickHouse Raises $250M Series B to Scale Groundbreaking OLAP Database Management System Globally". 28 October 2021.
  3. "Yandex, Russia's biggest technology company, celebrates 20 years". The Economist. 30 September 2017.
  4. Lardinois, Frederic (2022-12-06). "ClickHouse launches ClickHouse Cloud, extends its Series B". TechCrunch. Retrieved 2023-07-24.
  5. 1 2 "ClickHouse Guide". clickhouse.yandex. Retrieved 2016-11-10.
  6. 1 2 "Performance comparison of analytical DBMS". clickhouse.yandex. Retrieved 2016-11-10.
  7. "smi2/phpClickHouse". GitHub. Retrieved 2016-11-10.
  8. "apla/node-clickhouse". GitHub. Retrieved 2016-11-10.
  9. "elcamlost/perl-DBD-ClickHouse". GitHub. Retrieved 2016-11-10.
  10. "archan937/clickhouse". GitHub. Retrieved 2016-11-10.
  11. "hannesmuehleisen/clickhouse-r". GitHub. Retrieved 2016-11-10.
  12. "ClickHouse/clickhouse-odbc". GitHub. 13 December 2021.
  13. "ClickHouse/clickhouse-jdbc". GitHub. 11 December 2021.