Actian Vector

Last updated
Actian Vector
Developer(s) Actian Corporation
Stable release
Vector 6.3 / December 9, 2022 (2022-12-09) [1]
Operating system Cross-platform
Type RDBMS
License Proprietary
Website www.actian.com/products/analytics-platform/vector-smp-analytics-database/
Actian Vector in Hadoop
Developer(s) Actian Corporation
Stable release
Vector in Hadoop 6.0 / April 24, 2020 (2020-04-24) [2]
Operating system Linux
Type RDBMS
License Proprietary
Website www.actian.com/analytic-database/vectorh-sql-hadoop/

Actian Vector (formerly known as VectorWise) is an SQL relational database management system designed for high performance in analytical database applications. [3] It published record breaking results on the Transaction Processing Performance Council's TPC-H benchmark for database sizes of 100 GB, 300 GB, 1 TB and 3 TB on non-clustered hardware. [4] [5] [6] [7]

Contents

Vectorwise originated from the X100 research project carried out within the Centrum Wiskunde & Informatica (CWI, the Dutch National Research Institute for Mathematics and Computer Science) between 2003 and 2008. It was spun off as a start-up company in 2008, and acquired by Ingres Corporation in 2011. [8] It was released as a commercial product in June, 2010, [9] [10] [11] [12] initially for 64-bit Linux platform, and later also for Windows. Starting from 3.5 release in April 2014, the product name was shortened to "Vector". [13] In June 2014, Actian Vortex was announced as a clustered massive parallel processing version of Vector, in Hadoop with storage in HDFS. [14] [15] Actian Vortex was later renamed to Actian Vector in Hadoop.

Technology

The basic architecture and design principles of the X100 engine of the VectorWise database were well described in two Phd theses of VectorWise founders Marcin Żukowski: "Balancing Vectorized Query Execution with Bandwidth-Optimized Storage" [16] and Sandor Héman: "Updating Compressed Column Stores", [17] under supervision of another founder, professor Peter Boncz. The X100 engine was integrated with Ingres SQL front-end, allowing the database to use the Ingres SQL syntax, and Ingres set of client and database administration tools. [18]

The query execution architecture makes use of "Vectorized Query Execution" processing in chunks of cache-fitting vectors of data. This allows to involve the principles of vector processing and single instruction, multiple data (SIMD) to perform the same operation on multiple data simultaneously and exploit data level parallelism on modern hardware. It also reduces overheads found in traditional "row-at-a-time processing" found in most RDBMSes.

The database storage is in a compressed column-oriented format, [19] with scan-optimised buffer manager. In Actian Vortex in HDFS the same proprietary format is used.

Loading big amounts of data is supported through direct appends to stable storage, while small transactional updates are supported through patent-pending [20] Positional Delta Trees (PDTs) [17] [21] specialized B-tree-like structures of indexed differences on top of stable storage, which are seamlessly patched during scans, and which are transparently propagated to stable storage in a background process. The method of storing differences in patch-like structures and rewriting the stable storage in bulk made it possible to work in a filesystem like HDFS, in which files are append-only. [14]

History

A comparative Transaction Processing Performance Council TPC-H performance test of MonetDB carried out by its original creator at Centrum Wiskunde & Informatica (CWI) in 2003 showed room for improvement in its performance as an analytical database. As a result, CWI researchers proposed a new architecture using pipelined query processing ("vectorised processing") to improve the performance of analytical queries. This led to the creation of the "X100" project, with the intention of designing a new kernel for MonetDB, to be called "MonetDB/X100". [16] [22] [23]

The X100 project team won the 2007 DaMoN Best Paper Award for the paper "Vectorized Data Processing on the Cell Broadband Engine" [24] [25] as well as the 2008 DaMoN Best Paper Award for the paper "DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing". [26] [27]

In August 2009 the originators for the X100 project won the "Ten Year Best Paper Award" at the 35th International Conference on Very Large Data Bases (VLDB) for their 1999 paper "Database architecture Optimized for the new bottleneck: Memory access". It was recognised by the VLDB that the project team had made great progress in implementing the ideas contained in the paper over the previous 10 years. [28] The central premise of the paper is that traditional relational database systems were designed in the late 1970s and early 1980s during a time when database performance was dictated by the time required to read from and write data to hard disk. At that time available CPU was relatively slow and main memory was relatively small, so that very little data could be loaded into memory at a time. Over time hardware improved, with CPU speed and memory size doubling roughly every two years in accordance with Moore’s law, but that the design of traditional relational database systems had not adapted. The CWI research team described improvements in database code and data structures to make best use of modern hardware. [29]

In 2008 the X100 project was spun off from MonetDB as a separate project, with its own company, and renamed "VectorWise". Co-founders included Peter A. Boncz and Marcin Żukowski. [30] [31]

In June 2010, the VectorWise technology was officially announced by Ingres Corporation, [10] [32] with the release of Ingres VectorWise 1.0. [33]

In March 2011, VectorWise 1.5 was released, [34] publishing a record breaking result on TPC-H 100 GB benchmark. [5] [35] New features included parallel query execution (single query executed on multiple CPU cores), improved bulk loading and enhanced SQL support. In June 2011, VectorWise 1.6 was released, [6] publishing record breaking results on TPC-H 100 GB, [36] 300 GB [37] and 1 TB [38] non-clustered benchmark.

In December 2011, VectorWise 2.0 was released [39] with new SQL support for analytical functions such as rank and percentile and enhanced date, time and timestamp datatypes, and support for disk spilling in hash joins and aggregation.

In June 2012, VectorWise 2.5 was released. [40] In this release storage format was reorganized to allow storing the database in multiple location, the background update propagation mechanism from PDTs to stable storage was enhanced to allow rewriting only the changed blocks instead of full rewrites, and a new patented [41] Predictive Buffer Manager (PBM) was introduced. [42]

In March 2013, VectorWise 3.0 was released. [43] New features included more efficient storage engine, support for more data types and analytical SQL functions, enhanced DDL features, and improved monitoring and profiling accessibility.

In March 2014, Actian Vector 3.5 was released, with a new rebranded and shortened name. [13] New features included support for partitioned tables, improved disk spilling, online backup capabilities and improved SQL support - e.g. MERGE/UPSERT DML operations and FIRST_VALUE and LAST_VALUE window aggregation functions.

In June 2014, at Hadoop Summit 2014 in San Jose, Actian announced Actian Vortex clustered MPP version of Vector, with same level of SQL support working in Hadoop with storage directly in HDFS. [14] Actian Vortex was later renamed to Actian Vector in Hadoop, and non-clustered Actian Vector releases are also updated to match. [1] In March 2015 Actian Vector 4 was released, and Actian Vector in Hadoop 4 was released in December 2015. [44]

In March 2019, Actian Avalanche was released as a cloud data platform, with Vector as the core engine for the Warehouse offering. [45]

Release history

Actian Vector

ReleaseGeneral availabilityEnd of Enterprise SupportEnd of Extended SupportEnd of Obsolescence Support
Current stable version:6.3December, 2022December 31, 2025December 31, 2027December 31, 2029
Older version, yet still maintained: 6.2November, 2021November 30, 2024November 30, 2026November 30, 2028
Older version, yet still maintained: 6.0June, 2020June 30, 2023June 30, 2025June 30, 2027
Older version, yet still maintained: 5.1 (Windows - extended)May, 2018September 30, 2021September 30, 2023September 30, 2025
Older version, yet still maintained: 5.1 (Linux)May, 2018June 30, 2021June 30, 2023June 30, 2025
Older version, yet still maintained: 5.0June, 2016June 30, 2020June 30, 2022June 30, 2024
Old version, no longer maintained: 4.xMarch, 2015December 31, 2018December 31, 2020December 31, 2022
Old version, no longer maintained: 3.5.xMarch, 2014March 31, 2017March 31, 2019March 31, 2021
Old version, no longer maintained: 3.0.xApril, 2013April 15, 2016April 30, 2017Not Available
Old version, no longer maintained: 2.5.xJune, 2012June 1, 2015April 30, 2017Not Available
Old version, no longer maintained: 2.0.xNovember, 2011November, 2011April 30, 2017Not Available
Legend:
Old version
Older version, still maintained
Latest version
Latest preview version
Future release

Actian Vector in Hadoop

ReleaseGeneral availabilityEnd of Enterprise SupportEnd of Extended SupportEnd of Obsolescence Support
Current stable version:6.0April 24, 2020April 30, 2023April 30, 2026April 30, 2029
Older version, yet still maintained: 5.1November, 2018November 30, 2021November 30, 2023November 30, 2025
Older version, yet still maintained: 5.0October, 2018October 31, 2020October 31, 2022October 31, 2024
Old version, no longer maintained: 4.xDecember, 2015December 31, 2018December 31, 2020December 31, 2022
Legend:
Old version
Older version, still maintained
Latest version
Latest preview version
Future release

See also

Related Research Articles

<span class="mw-page-title-main">Ingres (database)</span> Database software

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

The Centrum Wiskunde & Informatica is a research centre in the field of mathematics and theoretical computer science. It is part of the institutes organization of the Dutch Research Council (NWO) and is located at the Amsterdam Science Park. This institute is famous as the creation site of the programming language Python. It was a founding member of the European Research Consortium for Informatics and Mathematics (ERCIM).

<span class="mw-page-title-main">MonetDB</span> Open source column-oriented relational database management system

MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Michael Stonebraker</span> American computer scientist (born 1943)

Michael Ralph Stonebraker is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

Versant Corporation was an American-based software company building specialized NoSQL data management systems. Versant was founded in Menlo Park, California (USA) in 1988. It was headquartered in Redwood City, California.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.

Actian is an American software company headquartered in Santa Clara, California that provides analytics-related software, products, and services. The company sells database software and technology, cloud engineered systems, and data integration solutions.

Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data. The name Druid comes from the shapeshifting Druid class in many role-playing games, to reflect that the architecture of the system can shift to solve different types of data problems.

<span class="mw-page-title-main">Martin L. Kersten</span> Dutch computer scientist (born 1953)

Martin L. Kersten was a computer scientist with research focus on database architectures, query optimization and their use in scientific databases. He was an architect of the MonetDB system, an open-source column store for data warehouses, online analytical processing (OLAP) and geographic information systems (GIS). He has been (co-) founder of several successful spin-offs of the Centrum Wiskunde & Informatica (CWI).

Peter Boncz is a Dutch computer scientist specializing in database systems. He is a researcher at the Centrum Wiskunde & Informatica and professor at the Vrije Universiteit Amsterdam in the special chair of Large-Scale Analytical Data Management.

Azure Data Lake is a scalable data storage and analytics service. The service is hosted in Azure, Microsoft's public cloud.

Apache IoTDB is a column-oriented open-source, time-series database (TSDB) management system written in Java. It has both edge and cloud versions, provides an optimized columnar file format for efficient time-series data storage, and TSDB with high ingestion rate, low latency queries and data analysis support. It is specially optimized for time-series oriented operations like aggregations query, downsampling and sub-sequence similarity search. The name IoTDB comes from Internet of Things (IoT) Database, which means it was designed as an IoT-native TSDB that resolves the pain points of the typical IoT scenarios, including massive data generation, high frequency sampling, out-of-order data, specific analytics requirements, high costs of storage and operation & maintenance, low computational power of IoT devices.

References

  1. 1 2 "Vector 6.3 Delivers Easier Administration, Greater Automation and Better Productivity for Data Analytics". 9 December 2022. Retrieved 2023-04-13.
  2. "Actian Looks To Help Firms Break Through Hadoop Constraints; Adds Real-Time, Security, ML Support". 2020-07-30. Retrieved 2023-04-13.
  3. "Vectorwise Enterprise". Actian Corporation. Retrieved 3 May 2012.
  4. "TPC-H - Top Ten Performance Results - Non-Clustered". Transaction Processing Performance Council . Retrieved 3 May 2012.
  5. 1 2 "Vectorwise Smashes TPC-H Record at Scale Factor 100 Delivering 340% of Previous Best Record" (Press release). Actian Corporation. 15 February 2011. Retrieved 7 February 2016.
  6. 1 2 "Vectorwise Breaks 300GB and 1TB TPC-H Benchmark Records Hands Down" (Press release). Actian Corporation. 4 May 2011. Retrieved 7 February 2011.
  7. "Actian Analytics Platform Outperforms All Others By 2X, Sets New Record In Latest TPC-H Benchmark". Actian Corporation. Retrieved 20 Aug 2016.
  8. "CWI spin-off company VectorWise sold to Ingres Corporation".
  9. Clarke, Gavin (2 February 2010). "Ingres' VectorWise rises to answer Microsoft". The Register .
  10. 1 2 Babcock, Charles (9 June 2010). "Ingres Unveils VectorWise Database Engine". InformationWeek .
  11. Suleman, Khidr (8 June 2010). "Ingres launches VectorWise database engine". V3.co.uk.
  12. Zukowski, Marcin; Boncz, Peter (2012). "From x100 to vectorwise". Proceedings of the 2012 international conference on Management of Data - SIGMOD '12. p. 861. doi:10.1145/2213836.2213967. ISBN   978-1-4503-1247-9. S2CID   9187072.
  13. 1 2 "Pssst: Want to Hear About Actian Vector 3.5?". 2016-05-04.
  14. 1 2 3 "Vector(wise) goes Hadoop".
  15. "Peter Boncz - Actian Vector on Hadoop: The First Industrial-strength DBMS to Truly Leverage Hadoop". YouTube .
  16. 1 2 Żukowski, Marcin (11 September 2009). "Balancing vectorized query execution with bandwidth-optimized storage" (PDF). Universiteit van Amsterdam . Retrieved 7 February 2016.{{cite journal}}: Cite journal requires |journal= (help)
  17. 1 2 Héman, Sandor (2015). "Updating Compressed Column Stores" (PDF). Vrije Universiteit Amsterdam . Retrieved 7 February 2016.{{cite journal}}: Cite journal requires |journal= (help)
  18. Inkster, Doug; Żukowski, Marcin; Boncz, Peter (September 2011). "Integration of VectorWise with Ingres" (PDF). SIGMOD Record. 40 (3): 45–53. doi:10.1145/2070736.2070747. hdl:1871/33100. S2CID   6372175 . Retrieved 7 February 2016.
  19. Zukowski, Marcin; Boncz, Peter (March 2012). "Vectorwise: Beyond Column Stores" (PDF). IEEE Data Engineering Bulletin. 35 (1): 21–27. Retrieved 4 May 2012.
  20. USapplication 20100235335,Sandor ABC Heman, Peter A. Boncz, Marcin Zukowski, Nicolaas J. Nes,"Column-store database architecture utilizing positional delta tree update system and methods",published 2010-09-16
  21. Héman, Sándor; Żukowski, Marcin; Nes, Niels; Sidirourgos, Lefteris; Boncz, Peter. "Positional update handling in column stores" (PDF). SIGMOD Conference 2010: 543–554.
  22. "Homepage of Peter Boncz" . Retrieved 7 February 2016.
  23. "Faster database technology with MonetDB/X100". CWI Amsterdam . Retrieved 4 May 2012.
  24. Héman, S.; Nes, N.J.; Zukowski, M.; Boncz, P.A. (2007). "Vectorized Data Processing on the Cell Broadband Engine". Universiteit van Amsterdam . Retrieved 4 May 2012.{{cite journal}}: Cite journal requires |journal= (help)
  25. "Third International Workshop on Data Management on New Hardware (DaMoN 2007)". Carnegie Mellon’s School of Computer Science (SCS) . Retrieved 4 May 2012.
  26. Zukowski, Marcin; Nes, Niels; Boncz, Peter (2008). "DSM vs. NSM". Proceedings of the 4th international workshop on Data management on new hardware - DaMoN '08. p. 47. doi:10.1145/1457150.1457160. ISBN   9781605581842. S2CID   11946467.
  27. "Fourth International Workshop on Data Management on New Hardware (DaMoN 2008)". Carnegie Mellon School of Computer Science . Retrieved 4 May 2012.
  28. "10-year Best Paper Award – VLDB 2009". International Conference on Very Large Data Bases . Retrieved 4 May 2012.
  29. Boncz, Peter; Manegold, Stefan; Kersten, Martin L. (15 June 1999). Database architecture optimized for the new bottleneck: Memory access (PDF). Universiteit van Amsterdam. pp. 54–65. ISBN   1-55860-615-7 . Retrieved 11 December 2013.{{cite book}}: |work= ignored (help)
  30. Curt Monash (25 April 2013). "Goodbye VectorWise, farewell ParAccel?". DBMS2. Retrieved 11 December 2013.
  31. "Peter Boncz". Staff web page. CWI. Retrieved 11 December 2013.
  32. Clark, Don (22 September 2011). "Database-Software Firm Tries 'Action Apps'". The Wall Street Journal .
  33. "Ingres Vectorwise 1.0" . Retrieved 7 February 2016.
  34. "An early look at Actian VectorWise 1.5".
  35. "TPC-H SF100 Vectorwise 1.5".
  36. "TPC-H SF100 Vectorwise 1.6".
  37. "TPC-H SF300 Vectorwise 1.6".
  38. "TPC-H SF1000 Vectorwise 1.6".
  39. "An even faster VectorWise".
  40. "Actian Releases Vectorwise 2.5 – Record-Breaking Database Is Now Even Faster".
  41. B1 USpatent 8825959 B1,Michal Switakowski, Peter Boncz, Marcin Zukowski,"Method and apparatus for using data access time prediction for improving data buffering policies",published 2014-09-02
  42. Świtakowski, Michał; Boncz, Peter; Żukowski, Marcin (August 2012). "From Cooperative Scans to Predictive Buffer Management" (PDF). Proceedings of the VLDB Endowment. 5 (12). VLDB 2012: 1759–1770. arXiv: 1208.4170 . Bibcode:2012arXiv1208.4170S. doi:10.14778/2367502.2367515. S2CID   17184937 . Retrieved 7 February 2016.
  43. "Actian Announces Availability of Vectorwise 3.0 for Getting Fast Answers from Big Data".
  44. "Lifecycle Dates - Actian Vector and Vector in Hadoop".
  45. "Actian Avalanche Real-Time Connected Data Warehouse adds integration".