This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
Developer(s) | ClickHouse, Inc. |
---|---|
Initial release | June 15, 2016 |
Stable release | v24.6.2.17-stable / July 5, 2024 [1] |
Repository | github |
Written in | C++ |
Operating system | Linux, FreeBSD, macOS |
License | Apache License 2.0 |
Website | clickhouse |
ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the San Francisco Bay Area with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.
In September 2021 in San Francisco, CA, ClickHouse incorporated to house the open source technology with an initial $50 million investment from Index Ventures and Benchmark Capital with participation by Yandex N.V. [2] and others. On October 28, 2021 the company received Series B funding totaling $250 million at a valuation of $2 billion from Coatue Management, Altimeter Capital, and other investors. The company continues to build the open source project and engineering cloud technology.
ClickHouse’s technology was first developed over 10 years ago at Yandex, Russia's largest technology company. [3] In 2009, Alexey Milovidov and developers started an experimental project to check the hypothesis if it was viable to generate analytical reports in real-time from non-aggregated data that is also constantly added in real-time. The developers spent 3 years to prove this hypothesis, and in 2012 ClickHouse launched in production for the first time to power Yandex.Metrica.
Unlike custom data structures used before, ClickHouse was applicable more generally to work as a database management system. The power and utility of ClickHouse offered a true column-oriented DBMS, it allowed for systems to generate reports from petabytes of raw data with sub-second latencies. ClickHouse was widely adopted at Yandex including for Yandex.Tank load testing tool and Yandex.Market to monitor site accessibility and KPIs.
In 2016, the ClickHouse project was released as open-source software under the Apache 2 license in June 2016 to power analytical use cases around the globe. The systems at the time offered a server throughput of a hundred thousand rows per second, ClickHouse outperformed them with a throughput of hundreds of millions of rows per second[ citation needed ].
Since ClickHouse became available as open source in 2016, its popularity has grown exponentially, as evidenced through adoption by industry-leading companies like Uber, Comcast, eBay, and Cisco. [4] ClickHouse was also implemented at CERN's LHCb experiment to store and process metadata on 10 billion events with over 1000 attributes per event.
The main features of the ClickHouse DBMS are: [5]
ClickHouse has some features that can be considered disadvantages:
ClickHouse was designed for OLAP queries. [5] ClickHouse performs well when:
For simple queries, latencies of 50 ms are typical.
One of the common cases for ClickHouse is server log analysis. After setting regular data uploads to ClickHouse (it's recommended to insert data in fairly large batches with more than 1000 rows), it's possible to analyze incidents with instant queries or monitor a service's metrics, such as error rates, response times, and so on.
ClickHouse can also be used as an internal data warehouse for in-house analysts. ClickHouse can store data from different systems (such as Hadoop or certain logs) and analysts can build internal dashboards with the data or perform real-time analysis for business purposes.
According to benchmark tests conducted by its developers, [6] for OLAP queries ClickHouse is more than 100 times faster than Hive (a DBMS based on the Hadoop technology stack) or MySQL (a common RDBMS).
Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.
In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.
In computing, online analytical processing, or OLAP, is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.
Microsoft SQL Server Analysis Services (SSAS) is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server. SSAS is used as a tool by organizations to analyze and make sense of information possibly spread out across multiple databases, or in disparate tables or files. Microsoft has included a number of services in SQL Server related to business intelligence and data warehousing. These services include Integration Services, Reporting Services and Analysis Services. Analysis Services includes a group of OLAP and data mining capabilities and comes in two flavors multidimensional and tabular, where the difference between the two is how the data is presented. In a tabular model, the information is arranged in two-dimensional tables which can thus be more readable for a human. A multidimensional model can contain information with many degrees of freedom, and must be unfolded to increase readability by a human.
In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.
Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.
SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.
IBM Netezza is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for the most demanding analytic uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.
Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system
Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.
The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information.
Java Database Connectivity (JDBC) is an application programming interface (API) for the Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. It is part of the Java Standard Edition platform, from Oracle Corporation. It provides methods to query and update data in a database, and is oriented toward relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-accessible data source in the Java virtual machine (JVM) host environment.
The following is provided as an overview of and topical guide to databases:
SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.
SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.
Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a NoSQL and vector database, which means it can handle unstructured, semi-structured, structured, and vector data types.
Apache Pinot is a column-oriented, open-source, distributed data store written in Java. Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion. The name Pinot comes from the Pinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.
YDB is a distributed SQL database management system (DBMS) developed by Yandex, available as open-source technology.