Time series database

Last updated

A time series database is a software system that is optimized for storing and serving time series through associated pairs of time(s) and value(s). [1] In some fields, time series may be called profiles, curves, traces or trends. [2] Several early time series databases are associated with industrial applications which could efficiently store measured values from sensory equipment (also referred to as data historians), but now are used in support of a much wider range of applications. In many cases, the repositories of time-series data will utilize compression algorithms to manage the data efficiently. [3] [4] Although it is possible to store time-series data in many different database types, the design of these systems with time as a key index is distinctly different from relational databases which reduce discrete relationships through referential models. [5]

Contents

Overview

Time series datasets are relatively large and uniform compared to other datasets―usually being composed of a timestamp and associated data. [6] Time series datasets can also have fewer relationships between data entries in different tables and don't require indefinite storage of entries. [6] The unique properties of time series datasets mean that time series databases can provide significant improvements in storage space and performance over general purpose databases. [6] For instance, due to the uniformity of time series data, specialized compression algorithms can provide improvements over regular compression algorithms designed to work on less uniform data. [6] Time series databases can also be configured to regularly delete (or downsample) old data, unlike regular databases which are designed to store data indefinitely. [6] Special database indices can also provide boosts in query performance. [6]

List of time series databases

The following database systems have functionality optimized for handling time series data.

NameLicenseLanguageReferences
Apache IoTDB Apache License 2.0 Java [7]
Apache Kudu Apache License 2.0 C++ [8]
Apache Pinot Apache License 2.0 Java [9]
CrateDB Apache License 2.0 Java [10] [11]
eXtremeDB CommercialSQL, Python, C / C++, Java, and C# [12]
InfluxDB MIT. [13] Chronograf AGPLv3, Clustering Commercial [14] Go (version 2), Rust (version 3) [15] [12] [16]
Informix TimeSeries Commercial C / C++ [12] [17]
Kx kdb+ Commercial Q [12]
MongoDB Server Side Public License C++, JavaScript, Python [18]
Prometheus Apache License 2.0 Go [12]
RedisTimeSeries RSALv2/SSPLv1 [19] C [20]
Riak-TS Apache License 2.0 Erlang [12]
RRDtool GPLv2 C [12]
TimescaleDB Apache License 2.0 C [21]
Whisper (Graphite) Apache License 2.0 Python [22]

See also

Related Research Articles

<span class="mw-page-title-main">MonetDB</span> Open source column-oriented relational database management system

MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns, and more options for data compression. However, they are typically less efficient for inserting new data.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Michael Stonebraker</span> American computer scientist (born 1943)

Michael Ralph Stonebraker is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

In transaction processing, the Telecommunication Application Transaction Processing Benchmark (TATP) is a benchmark designed to measure the performance of in-memory database transaction systems.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

Voldemort is a distributed data store that was designed as a key-value store used by LinkedIn for highly-scalable storage. It is named after the fictional Harry Potter villain Lord Voldemort.

H-Store is an experimental database management system (DBMS). It was designed for online transaction processing applications. H-Store was developed by a team at Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology, and Yale University in 2007 by researchers Michael Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">Martin L. Kersten</span> Dutch computer scientist (born 1953)

Martin L. Kersten was a computer scientist with research focus on database architectures, query optimization and their use in scientific databases. He was an architect of the MonetDB system, an open-source column store for data warehouses, online analytical processing (OLAP) and geographic information systems (GIS). He has been (co-) founder of several successful spin-offs of the Centrum Wiskunde & Informatica (CWI).

Feature engineering, a preprocessing step in supervised machine learning and statistical modeling, transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability.

<span class="mw-page-title-main">Apache Flink</span> Framework and distributed processing engine

Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

<span class="mw-page-title-main">Apache SINGA</span> Open-source machine learning library

Apache SINGA is an Apache top-level project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

Apache IoTDB is a column-oriented open-source, time-series database (TSDB) management system written in Java. It has both edge and cloud versions, provides an optimized columnar file format for efficient time-series data storage, and TSDB with high ingestion rate, low latency queries and data analysis support. It is specially optimized for time-series oriented operations like aggregations query, downsampling and sub-sequence similarity search. The name IoTDB comes from Internet of Things (IoT) Database, which means it was designed as an IoT-native TSDB that resolves the pain points of the typical IoT scenarios, including massive data generation, high frequency sampling, out-of-order data, specific analytics requirements, high costs of storage and operation & maintenance, low computational power of IoT devices.

<span class="mw-page-title-main">Worst-case optimal join algorithm</span> Algorithm for computing relational joins

A worst-case optimal join algorithm is an algorithm for computing relational joins with a runtime that is bounded by the worst-case output size of the join. Traditional binary join algorithms such as hash join operate over two relations at a time; joins between more than two relations are implemented by repeatedly applying binary joins. Worst-case optimal join algorithms are asymptotically faster in worst case than any join algorithm based on such iterated binary joins.

<span class="mw-page-title-main">Valkey</span> Freely available in-memory key–value database

Valkey is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Valkey offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Valkey is the successor to Redis, the most popular NoSQL database, and one of the most popular databases overall. Valkey or its predecessor Redis are used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

References

  1. Mueen, Abdullah; Keogh, Eamonn; Zhu, Qiang; Cash, Sydney; Westover, Brandon (2009). "Exact Discovery of Time Series Motifs" (PDF). University of California, Riverside . 2009: 473–484. doi:10.1137/1.9781611972795.41. ISBN   978-0-89871-682-5. PMC   6814436 . PMID   31656693. Archived from the original (PDF) on 25 June 2010. Retrieved 31 July 2019. Definition 2:A Time Series Database(D)is an unordered set of m time series possibly of different lengths.
  2. Villar-Rodriguez, Esther; Del Ser, Javier; Oregi, Izaskun; Bilbao, Miren Nekane; Gil-Lopez, Sergio (2017). "Detection of non-technical losses in smart meter data based on load curve profiling and time series analysis". Energy. 137: 118–128. doi:10.1016/j.energy.2017.07.008. hdl: 20.500.11824/693 .
  3. Pelkonen, Tuomas; Franklin, Scott; Teller, Justin; Cavallaro, Paul; Huang, Qi; Meza, Justin; Veeraraghavan, Kaushik (2015). "Gorilla". Proceedings of the VLDB Endowment. 8 (12): 1816–1827. doi:10.14778/2824032.2824078.
  4. Lockerman, Joshua (2020-04-22). "Time-series compression algorithms, explained". Timescale Blog. Retrieved 2022-10-07.
  5. Asay, Matt (26 June 2019). "Why time series databases are exploding in popularity". TechRepublic . Archived from the original on 26 June 2019. Retrieved 31 July 2019. Relational databases and NoSQL databases can be used for time series data, but arguably developers will get better performance from purpose-built time series databases, rather than trying to apply a one-size-fits-all database to specific workloads.
  6. 1 2 3 4 5 6 Wayner, Peter (15 January 2021). "Database trends: The rise of the time-series database". VentureBeat . Retrieved 7 July 2021.
  7. Wang, Chen; Huang, Xiangdong; Qiao, Jialin; Jiang, Tian; Rui, Lei; Zhang, Jinrui; Kang, Rong; Feinauer, Julian; McGrail, Kevin A.; Wang, Peng; Luo, Diaohan; Yuan, Jun; Wang, Jianmin; Sun, Jiaguang (August 2020). "Apache IoTDB: time-series database for internet of things". Proceedings of the VLDB Endowment. 13 (12): 2901–2904. doi:10.14778/3415478.3415504. ISSN   2150-8097. S2CID   221352039.
  8. "Benchmarking Time Series workloads on Apache Kudu using TSBS". 18 March 2020.
  9. Fu, Yupeng; Soman, Chinmay (9 June 2021). "Real-time Data Infrastructure at Uber". Proceedings of the 2021 International Conference on Management of Data. pp. 2503–2516. arXiv: 2104.00087 . doi:10.1145/3448016.3457552. ISBN   9781450383431. S2CID   232478317.
  10. "DB-Engines Ranking". DB-Engines. Retrieved 2023-01-22.
  11. "Anforderungen für Zeitreihendatenbanken im industriellen IoT". springerprofessional.de (in German). Retrieved 2023-01-22.
  12. 1 2 3 4 5 6 7 Stephens, Rachel (2018-04-03). "State of the Time Series Database Market" . Retrieved 2018-10-03.
  13. "influxdb license". GitHub. Retrieved 2016-08-14.
  14. "influxdb clustering". influxdata.com. Retrieved 2016-03-10.
  15. Wachtel, Jessica (2023-07-06). "Meet the Founders Who Rewrote in Rust". InfluxData. Retrieved 2023-10-05.
  16. Anadiotis, George (2018-09-28). "Processing time series data: What are the options?". zdnet.com. Retrieved 2016-03-10.
  17. Dantale, Viabhav (2012-09-21). Solving Business Problems with Informix TimeSeries (PDF). IBM Redbooks. ISBN   9780738437231.
  18. "MongoDB's New Time Series Collections".
  19. "RedisTimeSeries/LICENSE.txt at master · RedisTimeSeries/RedisTimeSeries". GitHub. Retrieved 2023-10-05.
  20. "RedisTimeSeries". Redis. Retrieved 12 June 2023.
  21. Design Recommendations for Intelligent Tutoring Systems: Volume 8 - Data Visualization. Army Research Laboratory. December 29, 2020. p. 50. ISBN   9780997725780.
  22. Joshi, Nishes (May 23, 2012). Interoperability in monitoring and reporting systems (Thesis). hdl:10852/9085.