Vertica

Last updated
Vertica
Industry Database management & Data warehousing
Founded2005
FounderAndrew Palmer and Michael Stonebraker
Headquarters Cambridge, MA, United States
Key people
Mark Barrenechea
(CEO and CTO)
ProductsVertica Analytics Database, Vertica SQL on Data Lake
Parent OpenText
Website www.vertica.com

Vertica is an analytic database management software company. [1] [2] Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

Contents

Lynch joined as Chairman and CEO in 2010 and was responsible for Vertica's acquisition by Hewlett Packard in March 2011. [3] [4] The acquisition expanded the HP Software portfolio for enterprise companies and the public sector group. [5] As part of the merger of Micro Focus and the Software division of Hewlett Packard Enterprise, Vertica joined Micro Focus in September 2017. [6] As part of OpenText acquisition of Micro Focus, Vertica joined OpenText in January 2023.

Products

The column-oriented Vertica Analytics Database was designed to manage large, fast-growing volumes of data and with fast query performance for data warehouses and other query-intensive applications. The product claims to greatly improve query performance over traditional relational database systems, and to provide high availability and exabyte scalability on commodity enterprise servers. Vertica runs on multiple cloud computing systems as well as on Hadoop nodes. Vertica's Eon Mode separates compute from storage, using S3 object storage and dynamic allocation of compute notes. [7]

Vertica's design features include:

Vertica's specialized approach aims to significantly increase query performance in data warehouses, while reducing hardware costs. [12]

Since 2011, Vertica has offered a limited-capacity community edition for free. [13]

In July, 2021, Vertica announced an SaaS offering, Vertica Accelerator, running on Amazon AWS. [14]

Optimizations

Vertica originated as the C-Store column-oriented database, an open source research project at MIT and other universities, published in 2005. [15] [16]

Vertica runs on clusters of commodity servers or on commercial clouds. It integrates with Hadoop, [17] using HDFS.

In 2018, Vertica introduced Vertica in Eon Mode, a separation of compute and storage architecture. The Eon architecture allows for elastic increase and decrease in compute capability as needed for workload elasticity. It also allows instantiation of multiple isolated sub-clusters dedicated to different workloads while maintaining a single shared data repository. It operates on shared object storage in the cloud, and also runs on object storage compatible hardware on-premises for private cloud implementations.

Version 10.1.1 of Vertica introduced Docker and Kubernetes support. [18]

Many BI, data visualization, and ETL tools work with Vertica Analytics Platform. Vertica supports Kafka for streaming data ingestion.

In 2021, Vertica released a connector for Spark. [19]

Vertica also integrates with Grafana, Helm, Go, and Distributed R. [20]

Company events

In January 2008, Sybase filed a patent-infringement lawsuit against Vertica. [21] In January 2010, Vertica prevailed in a preliminary hearing, [22] and in June, 2010, Sybase and Vertica resolved the suit, with the court dismissing all infringement claims. [23]

Since 2013, Vertica has held an annual user conference, now called Vertica Unify. [24]

Related Research Articles

<span class="mw-page-title-main">Ingres (database)</span>

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns, and more options for data compression. However, they are typically less efficient for inserting new data.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

<span class="mw-page-title-main">Greenplum</span>

Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired by EMC Corporation in July 2010.

<span class="mw-page-title-main">Michael Stonebraker</span> American computer scientist (born 1943)

Michael Ralph Stonebraker is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS and MapReduce technology. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming architecture framework that supports in-storage parallel data processing for data stored in Sector. Sector/Sphere operates in a wide area network (WAN) setting.

HP Information Management Software is a software from the HP Software Division, used to organize, protect, retrieve, acquire, manage, and maintain information. The HP Software Division also offers information analytics software. The amount of data that companies have to deal with has grown tremendously over the past decade, making the management of this information more difficult. The University of California at Berkeley claims the amount of information produced globally increases by 30 percent annually. An April 2010 Information Management article cited a survey in which nearly 90 percent of businesses blame poor performance on data growth. The survey concluded that for many businesses their applications and databases are growing by 50 percent or more annually, making it difficult to manage the rapid expansion of information.

Composite Software, Inc. was previously a privately held data virtualization software company based in San Mateo, California. Composite Software was founded in October 2001 by Michael R. Abbott. In 2003, former Active Software founder and webMethods CTO, Jim Green joined Composite Software as CEO. On July 30, 2013, Composite Software was acquired by Cisco Systems for approximately US $180 million. Then on October 5, 2017 TIBCO purchased what was Composite Software from Cisco.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications that devote most of their execution time to computational requirements are deemed compute-intensive, whereas applications are deemed data-intensive require large volumes of data and devote most of their processing time to I/O and manipulation of data.

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

Within database management systems, the RCFile is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure includes a data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.

<span class="mw-page-title-main">Actian Vector</span>

Actian Vector is an SQL relational database management system designed for high performance in analytical database applications. It published record breaking results on the Transaction Processing Performance Council's TPC-H benchmark for database sizes of 100 GB, 300 GB, 1 TB and 3 TB on non-clustered hardware.

HP ConvergedSystem is a portfolio of system-based products from Hewlett-Packard (HP) that integrates preconfigured IT components into systems for virtualization, cloud computing, big data, collaboration, converged management, and client virtualization. Composed of servers, storage, networking, and integrated software and services, the systems are designed to address the cost and complexity of data center operations and maintenance by pulling the IT components together into a single resource pool so they are easier to manage and faster to deploy. Where previously it would take three to six months from the time of order to get a system up and running, it now reportedly takes as few as 20 days with the HP ConvergedSystem.

<span class="mw-page-title-main">SingleStore</span> Database management system

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

Apache Arrow is a language-agnostic software framework for developing data analytics applications that process columnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modern CPU and GPU hardware. This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints of dynamic random-access memory.

Daniel Abadi is the Darnell-Kanal Professor of Computer Science at University of Maryland, College Park. His primary area of research is database systems, with contributions to stream databases, distributed databases, graph databases, and column-store databases. He helped create C-Store, a column-oriented database, and HadoopDB, a hybrid of relational databases and Hadoop. Both database systems were commercialized by companies.

References

  1. Network World staff: "New database company raises funds, nabs ex-Oracle bigwigs", LinuxWorld, February 14, 2007
  2. Brodkin, J: "10 enterprise software companies to watch", Archived 2007-05-18 at the Wayback Machine Network World, April 11, 2007
  3. HP News Release: "HP to Acquire Vertica: Customers Can Analyze Massive Amounts of Big Data at Speed and Scale" Feb. 2011
  4. HP News Release: "HP Completes Acquisition of Vertica Systems, Inc." March 22, 2011.
  5. ComputerWorld.com: "Update: HP to buy Vertica for analytics." Kanaracus. Feb. 2011.
  6. SiliconAngle: "Vertica survives software industry turmoil to emerge as key cloud and big data player" Albertson.
  7. Press Release: "Micro Focus Announces Vertica in Eon Mode for Pure Storage" Sept 17, 2019
  8. Monash, C: "Are row-oriented RDBMS obsolete?" DBMS2, January 22, 2007
  9. Monash, C: "Mike Stonebraker on database compression – comments",DBMS2, March 24, 2007
  10. Gagliordi, Natalie. "HP adds scale to open-source R in latest big data platform". ZDNet. Retrieved 17 February 2015.
  11. Prasad, Shreya; Fard, Arash; Gupta, Vishrut; Martinez, Jorge; LeFevre, Jeff; Xu, Vincent; Hsu, Meichun; Roy, Indrajit (2015). "Enabling predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction". ACM SIGMOD International Conference on Management of Data.
  12. One Size Fits All? Part 2: Benchmarking Results (sect. 3.1)
  13. "Vertica Announces Community Edition Version of Vertica Analytic Database". Archived from the original on July 4, 2015. Retrieved August 17, 2016.
  14. PR Newswire: "Vertica Announces Early Access of Vertica Accelerator" Micro Focus. June 15, 2021.
  15. Stonebraker, Mike; Abadi, Daniel J.; Batkin, Adam; Chen, Xuedong; Cherniack, Mitch; Ferreira, Miguel; Lau, Edmond; Lin, Amerson; Madden, Sam; O'Neil, Elizabeth; O'Neil, Pat; Rasin, Alex; Tran, Nga; Zdonik, Stan (2018). "C-store: a column-oriented DBMS". In Brodie, Michael L. (ed.). Making Databases Work: The Pragmatic Wisdom of Michael Stonebraker. Association for Computing Machinery/Morgan & Claypool. pp. 491–518. doi:10.1145/3226595.3226638. ISBN   9781947487192. S2CID   3439184.
  16. "The Vertica Analytic Database: C-Store 7 Years Later" (PDF). VLDB. August 28, 2012.
  17. "Vertica-Hadoop integration". DBMS2. October 12, 2010.
  18. Vertica Blog: "Vertica 10.1.1 Goes Beyond Analytics with Support for Azure Cloud, Kubernetes, and Containers" Healey. April 30,2021
  19. "Vertica Spark Connector". GitHub . 25 February 2022.
  20. "Vertica". GitHub .
  21. Sybase, Inc. v. Vertica Systems, Inc.(Texas Eastern District CourtJanuary 30, 2008), Text .
  22. Monash, C: "Vertica slaughters Sybase in patent litigation",DBMS2, January 14, 2010
  23. Vertica Press Release, "Vertica Resolves Sybase Patent Lawsuits" http://www.vertica.com/news/press/vertica-resolves-sybase-patent-lawsuits/
  24. https://events.vertica.com/unify