Hybrid transactional/analytical processing

Last updated

Hybrid transaction/analytical processing (HTAP) is a term created by Gartner Inc., an information technology research and advisory company, in its early 2014 research report Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation. [1] As defined by Gartner:

Contents

Hybrid transaction/analytical processing (HTAP) is an emerging application architecture that "breaks the wall" between transaction processing and analytics. It enables more informed and "in business real time" decision making. [2] [1]

In more recent reports Gartner has begun referring to HTAP as "augmented transactions." [3] Another analyst firm Forrester Research calls the same concept "Translytical" [4] while 451 Group calls it "Hybrid operational and analytical processing" or HOAP. [5]

Background

In the 1960s, computer use in the business sector began with payroll transactions and later included tasks in areas such as accounting and billing. At that time, users entered data, and the system processed it at a later time. Further development of instantaneous data processing, or online transaction processing (OLTP), led to widespread OLTP use in government and business-sector information systems. [6]

Online analytical processing (OLAP) covers the analytical processing involved in creating, synthesizing, and managing data. With greater data demands among businesses,[ citation needed ] OLAP also has evolved. To meet the needs of applications, both technologies are dependent on their own systems and distinct architectures. [7] [6] As a result of the complexity in the information architecture and infrastructure of both OLTP and OLAP systems, data analysis is delayed. [7] [ need quotation to verify ]

HTAP advantages and challenges

There are various interpretations of HTAP other than Gartner's original definition; an "emerging architecture". These interpretations suggest different advantages, one being a database functionality. Recent advances in research, hardware, OLTP and OLAP capabilities, in-memory and cloud native database technologies, [8] scalable transactional management and products enable transactional processing and analytics, or HTAP, to operate on the same database. [7] [9] [6]

However, Gartner's most recent reports suggest broader advantages than a single unified database can offer. Traditional application architectures separated transactional and analytical systems. Digital business, and the need to respond to business moments, means that using "after the fact" analysis is no longer adequate. Business moments are transient opportunities that must be exploited in real time. If an organization is unable to recognize and/or respond quickly to a business moment by taking fast and well-informed decisions, then some other organization will, resulting in a missed opportunity (or a new business threat). HTAP allows advanced analytics to be run in real time on "in flight" transaction data, providing an architecture that empowers users to respond more effectively to business moments. [10]

The main technical challenges for an HTAP database are how to be efficient both for operational (many small transactions with a high fraction of updates) and analytical workloads (large and complex queries traversing large number of rows) on the same database system and how to prevent the interference of the analytical queries over the operational workload. This kind of operational workload is also commonly referred to as Operational Analytical Processing.

HTAP solves the issue of analytic latency in several ways, including eliminating the need for multiple copies of the same data and the requirement for data to be offloaded from operational databases to data warehouses via ETL processes. [7] [9]

Most applications of HTAP are enabled by in-memory technologies that can process a high volume of transactions and offer features such as forecasting and simulations. New HTAP technologies use scalable transactional processing, and do not need to rely on keeping the whole database in-memory. HTAP has the potential to change the way organizations do business by offering immediate business decision-making capabilities based on live and sophisticated analytics of large volumes of data. Government and business leaders can be informed of real-time issues, outcomes, and trends that necessitate action, such as in the areas of public safety, risk management, and fraud detection. [7] [11]

Some challenges for HTAP include limited industry experience and skills, as well as undefined best practices. [7]

In 2020, the first paper in the industry was published by the team of PingCAP describing the practical implementation of a distributed Hybrid Transactional/Analytical Processing (HTAP) database: TiDB: A Raft-based HTAP Database. [12]

Related Research Articles

<span class="mw-page-title-main">Data warehouse</span> Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is a core component of business intelligence. Data warehouses are central repositories of data integrated from disparate sources. They store current and historical data organized so as to make it easy to create reports, query and get insights from the data. Unlike databases, they are intended to be used by analysts and managers to help make organizational decisions.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.

In computing, online analytical processing, or OLAP, is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

Essbase is a multidimensional database management system (MDBMS) that provides a platform upon which to build analytic applications. Essbase began as a product from Arbor Software, which merged with Hyperion Software in 1998. Oracle Corporation acquired Hyperion Solutions Corporation in 2007. Until late 2005 IBM also marketed an OEM version of Essbase as DB2 OLAP Server.

Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to the fact that such systems are expected to respond to user requests and process them in real-time. The term is contrasted with online analytical processing (OLAP) which instead focuses on data analysis.

Data orientation refers to how tabular data is represented in a linear memory model such as in-disk or in-memory.The two most common representations are column-oriented and row-oriented.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

Operational database management systems, are used to update data in real-time. These types of databases allow users to do more than simply view archived data. Operational databases allow you to modify that data, doing it in real-time. OLTP databases provide transactions as main abstraction to guarantee data consistency that guarantee the so-called ACID properties. Basically, the consistency of the data is guaranteed in the case of failures and/or concurrent access to the data.

<span class="mw-page-title-main">Exasol</span> German database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

Database activity monitoring is a database security technology for monitoring and analyzing database activity. DAM may combine data from network-based monitoring and native audit information to provide a comprehensive picture of database activity. The data gathered by DAM is used to analyze and report on database activity, support breach investigations, and alert on anomalies. DAM is typically performed continuously and in real-time.

In computing, CloudTran, a transaction management product, enables applications running in distributed computing and cloud computing architectures to embed logical business transactions that adhere to the properties of ACID transactions. Specifically, CloudTran coordinates ACID transactionality for data stored within in-memory data grids, as well as from the data grid to persistent storage systems.

The term is used for two different things:

  1. In computer science, in-memory processing (PIM) is a computer architecture in which data operations are available directly on the data memory, rather than having to be transferred to CPU registers first. This may improve the power usage and performance of moving data between the processor and the main memory.
  2. In software engineering, in-memory processing is a software architecture where a database is kept entirely in random-access memory (RAM) or flash memory so that usual accesses, in particular read or query operations, do not require access to disk storage. This may allow faster data operations such as "joins", and faster reporting and decision-making in business.
<span class="mw-page-title-main">SingleStore</span> Database management system

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

GigaSpaces Technologies Inc., is a privately held software company, established in 2000, with its headquarters located in New York City, and additional offices in Europe, Asia, and Israel.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">SAP HANA</span> Database management system by SAP

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

SequoiaDB is a multi-model NewSQL database.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

References

  1. 1 2 "Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation". Gartner. 28 January 2014. Retrieved 4 October 2021.
  2. "Market Guide for HTAP-Enabling In-Memory Computing Technologies". www.gartner.com. Archived from the original on 17 May 2017. Retrieved 15 April 2017.
  3. "Hype Cycle for Data Management, 2019". Gartner.
  4. "Forrester". Forrester.
  5. https://mariadb.com/wp-content/uploads/2020/09/enterprises-turn-to-hoap-for-oltp-workloads_analyst-report_1088.pdf [ bare URL PDF ]
  6. 1 2 3 Bog, Anja. Benchmarking Transaction and Analytical Processing Systems: The Creation of a Mixed Workload Benchmark and Its Application Springer-Verlage Berlin Heidelberg. 2014
  7. 1 2 3 4 5 6 Pezzini, Massimo; Feinberg, Donald; Rayner, Nigel; Edjlali, Roxane. "Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation." Gartner. 28 January 2014
  8. "Azure Analytics: Clarity in an instant". azure.microsoft.com. 19 May 2020. Retrieved 20 June 2020.
  9. 1 2 Wolpe, Toby. "SQL and NoSQL? Fine, but how does the hybrid database fit in?" ZDNet. 12 May 2014
  10. "How to Enable Digital Business Innovation via Hybrid Transaction/Analytical Processing". www.gartner.com. Retrieved 15 April 2017.
  11. Baer, Tony. "Fast Data hits the Big Data fast lane." ZDNet. 16 April 2012
  12. "TiDB: A Raft-based HTAP Database" (PDF). Proceedings of the VLDB Endowment. 13 (12): 3072. doi:10.14778/3415478.3415535. ISSN   2150-8097. S2CID   221666363.