Time series database

Last updated December 19, 2024

A time series database is a software system that is optimized for storing and serving time series through associated pairs of time(s) and value(s).^[1] In some fields, time series may be called profiles, curves, traces or trends.^[2] Several early time series databases are associated with industrial applications which could efficiently store measured values from sensory equipment (also referred to as data historians), but now are used in support of a much wider range of applications. In many cases, the repositories of time-series data will utilize compression algorithms to manage the data efficiently.^[3]^[4] Although it is possible to store time-series data in many different database types, the design of these systems with time as a key index is distinctly different from relational databases which reduce discrete relationships through referential models.^[5]

Overview

Time series datasets are relatively large and uniform compared to other datasets―usually being composed of a timestamp and associated data.^[6] Time series datasets can also have fewer relationships between data entries in different tables and don't require indefinite storage of entries.^[6] The unique properties of time series datasets mean that time series databases can provide significant improvements in storage space and performance over general purpose databases.^[6] For instance, due to the uniformity of time series data, specialized compression algorithms can provide improvements over regular compression algorithms designed to work on less uniform data.^[6] Time series databases can also be configured to regularly delete (or downsample) old data, unlike regular databases which are designed to store data indefinitely.^[6] Special database indices can also provide boosts in query performance.^[6]

List of time series databases

The following database systems have functionality optimized for handling time series data.


Name	License	Language	References
Amazon Timestream for LiveAnalytics	Commercial	Java	^[7]
Apache IoTDB	Apache License 2.0	Java	^[8]
Apache Kudu	Apache License 2.0	C++	^[9]
Apache Pinot	Apache License 2.0	Java	^[10]
ClickHouse	Apache License 2.0	C++	^[11]
CrateDB	Apache License 2.0	Java	^[12]^[13]
eXtremeDB	Commercial	SQL, Python, C / C++, Java, and C#	^[14]
InfluxDB	MIT.^[15] Chronograf AGPLv3, Clustering Commercial^[16]	Go (version 2), Rust (version 3)^[17]	^[14]^[18]
Informix TimeSeries	Commercial	C / C++	^[14]^[19]
Kx kdb+	Commercial	Q	^[14]
MongoDB	Server Side Public License	C++, JavaScript, Python	^[20]
Prometheus	Apache License 2.0	Go	^[14]
RedisTimeSeries	RSALv2/SSPLv1 ^[21]	C	^[22]
Riak-TS	Apache License 2.0	Erlang	^[14]
RRDtool	GPLv2	C	^[14]
TimescaleDB	Apache License 2.0	C	^[23]
Whisper (Graphite)	Apache License 2.0	Python	^[24]

Related Research Articles

MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.

Data orientation refers to how tabular data is represented in a linear memory model such as in-disk or in-memory.The two most common representations are column-oriented and row-oriented.

Michael Ralph Stonebraker is an American computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

In transaction processing, the Telecommunication Application Transaction Processing Benchmark (TATP) is a benchmark designed to measure the performance of in-memory database transaction systems.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

Voldemort is a distributed data store that was designed as a key-value store used by LinkedIn for highly-scalable storage. It is named after the fictional Harry Potter villain Lord Voldemort.

H-Store is an experimental database management system (DBMS). It was designed for online transaction processing applications. H-Store was developed by a team at Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology, and Yale University in 2007 by researchers Michael Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

Martin L. Kersten was a computer scientist with research focus on database architectures, query optimization and their use in scientific databases. He was an architect of the MonetDB system, an open-source column store for data warehouses, online analytical processing (OLAP) and geographic information systems (GIS). He has been (co-) founder of several successful spin-offs of the Centrum Wiskunde & Informatica (CWI).

Hybrid transaction/analytical processing (HTAP) is a term created by Gartner Inc., an information technology research and advisory company, in its early 2014 research report Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation. As defined by Gartner:

Hybrid transaction/analytical processing (HTAP) is an emerging application architecture that "breaks the wall" between transaction processing and analytics. It enables more informed and "in business real time" decision making.

Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability.

Discovering communities in a network, known as community detection/discovery, is a fundamental problem in network science, which attracted much attention in the past several decades. In recent years, with the tremendous studies on big data, another related but different problem, called community search, which aims to find the most likely community that contains the query node, has attracted great attention from both academic and industry areas. It is a query-dependent variant of the community detection problem. A detailed survey of community search can be found at ref., which reviews all the recent studies

Apache SINGA is an Apache top-level project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

Daniel Abadi is the Darnell-Kanal Professor of Computer Science at University of Maryland, College Park. His primary area of research is database systems, with contributions to stream databases, distributed databases, graph databases, and column-store databases. He helped create C-Store, a column-oriented database, and HadoopDB, a hybrid of relational databases and Hadoop. Both database systems were commercialized by companies.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

Tim Kraska is a German computer scientist specializing in data systems and the intersection of systems and machine learning. He is currently an associate professor of computer science at the Massachusetts Institute of Technology.

Apache IoTDB is a column-oriented open-source, time-series database (TSDB) management system written in Java. It has both edge and cloud versions, provides an optimized columnar file format for efficient time-series data storage, and TSDB with high ingestion rate, low latency queries and data analysis support. It is specially optimized for time-series oriented operations like aggregations query, downsampling and sub-sequence similarity search. The name IoTDB comes from Internet of Things (IoT) Database, which means it was designed as an IoT-native TSDB that resolves the pain points of the typical IoT scenarios, including massive data generation, high frequency sampling, out-of-order data, specific analytics requirements, high costs of storage and operation & maintenance, low computational power of IoT devices.

A worst-case optimal join algorithm is an algorithm for computing relational joins with a runtime that is bounded by the worst-case output size of the join. Traditional binary join algorithms such as hash join operate over two relations at a time; joins between more than two relations are implemented by repeatedly applying binary joins. Worst-case optimal join algorithms are asymptotically faster in worst case than any join algorithm based on such iterated binary joins.

References

↑ Mueen, Abdullah; Keogh, Eamonn; Zhu, Qiang; Cash, Sydney; Westover, Brandon (2009). "Exact Discovery of Time Series Motifs". Proceedings of the 2009 SIAM International Conference on Data Mining (PDF). Vol. 2009. pp. 473–484. doi:10.1137/1.9781611972795.41. ISBN 978-0-89871-682-5. PMC 6814436 . PMID 31656693. Archived from the original (PDF) on 25 June 2010. Retrieved 31 July 2019. Definition 2:A Time Series Database(D)is an unordered set of m time series possibly of different lengths.
↑ Villar-Rodriguez, Esther; Del Ser, Javier; Oregi, Izaskun; Bilbao, Miren Nekane; Gil-Lopez, Sergio (2017). "Detection of non-technical losses in smart meter data based on load curve profiling and time series analysis". Energy. 137: 118–128. Bibcode:2017Ene...137..118V. doi:10.1016/j.energy.2017.07.008. hdl: 20.500.11824/693 .
↑ Pelkonen, Tuomas; Franklin, Scott; Teller, Justin; Cavallaro, Paul; Huang, Qi; Meza, Justin; Veeraraghavan, Kaushik (2015). "Gorilla". Proceedings of the VLDB Endowment. 8 (12): 1816–1827. doi:10.14778/2824032.2824078.
↑ Lockerman, Joshua (2020-04-22). "Time-series compression algorithms, explained". Timescale Blog. Retrieved 2022-10-07.
↑ Asay, Matt (26 June 2019). "Why time series databases are exploding in popularity". TechRepublic . Archived from the original on 26 June 2019. Retrieved 31 July 2019. Relational databases and NoSQL databases can be used for time series data, but arguably developers will get better performance from purpose-built time series databases, rather than trying to apply a one-size-fits-all database to specific workloads.
1 2 3 4 5 6 Wayner, Peter (15 January 2021). "Database trends: The rise of the time-series database". VentureBeat . Retrieved 7 July 2021.
↑ "Amazon Timestream - Time series is the new black". June 2021.
↑ Wang, Chen; Huang, Xiangdong; Qiao, Jialin; Jiang, Tian; Rui, Lei; Zhang, Jinrui; Kang, Rong; Feinauer, Julian; McGrail, Kevin A.; Wang, Peng; Luo, Diaohan; Yuan, Jun; Wang, Jianmin; Sun, Jiaguang (August 2020). "Apache IoTDB: time-series database for internet of things". Proceedings of the VLDB Endowment. 13 (12): 2901–2904. doi:10.14778/3415478.3415504. ISSN 2150-8097. S2CID 221352039.
↑ "Benchmarking Time Series workloads on Apache Kudu using TSBS". 18 March 2020.
↑ Fu, Yupeng; Soman, Chinmay (9 June 2021). "Real-time Data Infrastructure at Uber". Proceedings of the 2021 International Conference on Management of Data. pp. 2503–2516. arXiv: 2104.00087 . doi:10.1145/3448016.3457552. ISBN 9781450383431. S2CID 232478317.
↑ Schulze, Robert; Schreiber, Tom; Yatsishin, Ilya; Dahimene, Ryadh; Milovidov, Alexey (August 2024). "ClickHouse - Lightning Fast Analytics for Everyone" (PDF). Proceedings of the VLDB Endowment. 17 (12): 3731–3744. doi:10.14778/3685800.3685802.
↑ "DB-Engines Ranking". DB-Engines. Retrieved 2023-01-22.
↑ "Anforderungen für Zeitreihendatenbanken im industriellen IoT". springerprofessional.de (in German). Retrieved 2023-01-22.
1 2 3 4 5 6 7 Stephens, Rachel (2018-04-03). "State of the Time Series Database Market" . Retrieved 2018-10-03.
↑ "influxdb license". GitHub. Retrieved 2016-08-14.
↑ "influxdb clustering". influxdata.com. Retrieved 2016-03-10.
↑ Wachtel, Jessica (2023-07-06). "Meet the Founders Who Rewrote in Rust". InfluxData. Retrieved 2023-10-05.
↑ Anadiotis, George (2018-09-28). "Processing time series data: What are the options?". ZDNet . Retrieved 2016-03-10.
↑ Dantale, Viabhav (2012-09-21). Solving Business Problems with Informix TimeSeries (PDF). IBM Redbooks. ISBN 9780738437231.
↑ "MongoDB's New Time Series Collections".
↑ "RedisTimeSeries/LICENSE.txt at master · RedisTimeSeries/RedisTimeSeries". GitHub. Retrieved 2023-10-05.
↑ "RedisTimeSeries". Redis. Retrieved 12 June 2023.
↑ Design Recommendations for Intelligent Tutoring Systems: Volume 8 - Data Visualization. Army Research Laboratory. December 29, 2020. p. 50. ISBN 9780997725780.
↑ Joshi, Nishes (May 23, 2012). Interoperability in monitoring and reporting systems (Thesis). hdl:10852/9085.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Various,_UCR,-1] Mueen, Abdullah; Keogh, Eamonn; Zhu, Qiang; Cash, Sydney; Westover, Brandon (2009). "Exact Discovery of Time Series Motifs". Proceedings of the 2009 SIAM International Conference on Data Mining (PDF). Vol. 2009. pp. 473–484. doi:10.1137/1.9781611972795.41. ISBN 978-0-89871-682-5. PMC 6814436 . PMID 31656693. Archived from the original (PDF) on 25 June 2010. Retrieved 31 July 2019. Definition 2:A Time Series Database(D)is an unordered set of m time series possibly of different lengths.

[2] Villar-Rodriguez, Esther; Del Ser, Javier; Oregi, Izaskun; Bilbao, Miren Nekane; Gil-Lopez, Sergio (2017). "Detection of non-technical losses in smart meter data based on load curve profiling and time series analysis". Energy. 137: 118–128. Bibcode:2017Ene...137..118V. doi:10.1016/j.energy.2017.07.008. hdl: 20.500.11824/693 .

[Gorilla-3] Pelkonen, Tuomas; Franklin, Scott; Teller, Justin; Cavallaro, Paul; Huang, Qi; Meza, Justin; Veeraraghavan, Kaushik (2015). "Gorilla". Proceedings of the VLDB Endowment. 8 (12): 1816–1827. doi:10.14778/2824032.2824078.

[Lockerman_2020-4] Lockerman, Joshua (2020-04-22). "Time-series compression algorithms, explained". Timescale Blog. Retrieved 2022-10-07.

[Asay,_TechRepublic,_2019-5] Asay, Matt (26 June 2019). "Why time series databases are exploding in popularity". TechRepublic . Archived from the original on 26 June 2019. Retrieved 31 July 2019. Relational databases and NoSQL databases can be used for time series data, but arguably developers will get better performance from purpose-built time series databases, rather than trying to apply a one-size-fits-all database to specific workloads.

[Wayner2021-6] 1 2 3 4 5 6 Wayner, Peter (15 January 2021). "Database trends: The rise of the time-series database". VentureBeat . Retrieved 7 July 2021.

[7] "Amazon Timestream - Time series is the new black". June 2021.

[8] Wang, Chen; Huang, Xiangdong; Qiao, Jialin; Jiang, Tian; Rui, Lei; Zhang, Jinrui; Kang, Rong; Feinauer, Julian; McGrail, Kevin A.; Wang, Peng; Luo, Diaohan; Yuan, Jun; Wang, Jianmin; Sun, Jiaguang (August 2020). "Apache IoTDB: time-series database for internet of things". Proceedings of the VLDB Endowment. 13 (12): 2901–2904. doi:10.14778/3415478.3415504. ISSN 2150-8097. S2CID 221352039.

[9] "Benchmarking Time Series workloads on Apache Kudu using TSBS". 18 March 2020.

[10] Fu, Yupeng; Soman, Chinmay (9 June 2021). "Real-time Data Infrastructure at Uber". Proceedings of the 2021 International Conference on Management of Data. pp. 2503–2516. arXiv: 2104.00087 . doi:10.1145/3448016.3457552. ISBN 9781450383431. S2CID 232478317.

[11] Schulze, Robert; Schreiber, Tom; Yatsishin, Ilya; Dahimene, Ryadh; Milovidov, Alexey (August 2024). "ClickHouse - Lightning Fast Analytics for Everyone" (PDF). Proceedings of the VLDB Endowment. 17 (12): 3731–3744. doi:10.14778/3685800.3685802.

[12] "DB-Engines Ranking". DB-Engines. Retrieved 2023-01-22.

[13] "Anforderungen für Zeitreihendatenbanken im industriellen IoT". springerprofessional.de (in German). Retrieved 2023-01-22.

[redmonk-14] 1 2 3 4 5 6 7 Stephens, Rachel (2018-04-03). "State of the Time Series Database Market" . Retrieved 2018-10-03.

[MITgithub-15] "influxdb license". GitHub. Retrieved 2016-08-14.

[16] "influxdb clustering". influxdata.com. Retrieved 2016-03-10.

[17] Wachtel, Jessica (2023-07-06). "Meet the Founders Who Rewrote in Rust". InfluxData. Retrieved 2023-10-05.

[processing_time_series_data-18] Anadiotis, George (2018-09-28). "Processing time series data: What are the options?". ZDNet . Retrieved 2016-03-10.

[19] Dantale, Viabhav (2012-09-21). Solving Business Problems with Informix TimeSeries (PDF). IBM Redbooks. ISBN 9780738437231.

[20] "MongoDB's New Time Series Collections".

[21] "RedisTimeSeries/LICENSE.txt at master · RedisTimeSeries/RedisTimeSeries". GitHub. Retrieved 2023-10-05.

[22] "RedisTimeSeries". Redis. Retrieved 12 June 2023.

[23] Design Recommendations for Intelligent Tutoring Systems: Volume 8 - Data Visualization. Army Research Laboratory. December 29, 2020. p. 50. ISBN 9780997725780.

[Joshi-24] Joshi, Nishes (May 23, 2012). Interoperability in monitoring and reporting systems (Thesis). hdl:10852/9085.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

Time series database

Contents

Overview

List of time series databases

See also

Related Research Articles

References