Greenplum

Last updated
Greenplum
Company typeProduct of Broadcom
Industry Big data technologies
Headquarters Palo Alto, California
Products Database management system software
Greenplum Database
Developer(s) Broadcom
Stable release
7.0.0 / September 28, 2023;14 months ago (2023-09-28)
Operating system Linux
Type Database management system
Website greenplum.org

Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired by EMC Corporation in July 2010. [1]

Contents

Starting in 2012, its database management system software became known as the Pivotal Greenplum Database sold through Pivotal Software. Pivotal open sourced the core engine and continued its development by the Greenplum Database open source community and Pivotal.

Starting in 2020 Pivotal was acquired by VMware [2] and VMware continued to sponsor the Greenplum Database open source community as well as commercialize the technology under the brand name VMware Tanzu Greenplum. In November 2023, VMware was acquired by Broadcom. [3]

In May 2024, Tanzu by Broadcom made the decision to close source the Greenplum Database project. All future releases of Greenplum Database will be closed source and released as part of the VMware Tanzu Data Suite.

Company

Greenplum, the company, was founded in September 2003 by Scott Yara and Luke Lonergan. It was a merger of two smaller companies: Metapa (founded in August 2000 near Los Angeles) [4] and Didera in Fairfax, Virginia. [5]

Investors included SoundView Ventures, Hudson Ventures and Royal Wulff Ventures. A total of US$20 million in funding was announced at the merger. [6] Greenplum, based in San Mateo, California, released its database management system software based on PostgreSQL in April 2005 calling it Bizgres. [7] Rounds of venture capital of about US$15 million each were invested in March 2006 and February 2007. [8]

In July 2006 a partnership with Sun Microsystems was announced. [9] Sun, which had also acquired MySQL AB, participated in a round of US$27 million investment in January 2009, led by Meritech Capital Partners. [8] The Bizgres project included a few other members, and was supported through about 2008, when the product was just called "Greenplum" as well. [10] [11] The Sun Fire X4500 was a reference architecture and used by the majority of customers until a transition was made to Linux around that time. Greenplum was acquired by EMC Corporation in July 2010, becoming the foundation of EMC's big data software division. [1] Although EMC did not disclose the value, it was estimated at US$300 million. [12] [13] Greenplum's products at the time of acquisition were the Greenplum Database, Chorus (a management tool), and Data Science Labs. Greenplum had customers in vertical markets including eBay. [14] It became part of Pivotal Software in 2012. [15]

A variant using Apache Hadoop to store data in the Hadoop file system called Hawq was announced in 2013. [16] [17] In 2015 the GreenplumDB and Hawq open source software projects were announced. [18]

Technology

Pivotal's Greenplum database product uses massively parallel processing (MPP) techniques. Each computer cluster consists of a master node, standby master node, and segment nodes. [19] All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the data definition language. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a query enters the master node, it is parsed, planned and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table. The Structured Query Language, version SQL:2003, is used to present queries to the system. Transaction semantics comply with constraints known as ACID. [20]

Competitors include other MPP database management systems provided by major vendors such as Teradata, Amazon Redshift, Microsoft Azure, Alibaba AnalyticDB and, in the past, IBM Netezza. [19] [21] Additional competition comes from other smaller competitors, column-oriented databases such as HP Vertica, Exasol and data warehousing vendors with non MPP architecture, such as Oracle Exadata, IBM Db2 and SAP HANA.

Greenplum Version 7

In September 2023, Greenplum Database Version 7 was released. [22] Version 7 is based on PostgreSQL version 12.12.

Greenplum Version 6

In September 2019, Greenplum Database Version 6 was released. Version 6 is based on PostgreSQL version 9.4 and features massive gains in [23] OLTP performance. Greenplum 6 was reviewed in the media by several sources and mentioned for its Postgres open source alignment [24] and for its OLTP performance [25]

Greenplum Version 5

In September 2017, Greenplum Database Version 5 was released. Version 5 includes the first iteration of the Greenplum project strategy of merging PostgreSQL later versions back into Greenplum and is based on PostgreSQL version 8.3 up from the previous version 8.2. [26] Version 5 also introducing the General Availability of the GPORCA Optimizer [27] for cost based optimization of SQL designed for big data.

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.

A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node in a computer cluster. The intent is to eliminate contention among nodes. Nodes do not share the same memory or storage.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

In computing, the term data warehouse appliance (DWA) was coined by Foster Hinshaw for a computer architecture for data warehouses (DW) specifically marketed for big data analysis and discovery that is simple to use and has a high performance for the workload. A DWA includes an integrated set of servers, storage, operating systems, and databases.

Aster Data Systems was a data management and analysis software company headquartered in San Carlos, California. It was founded in 2005 and acquired by Teradata in 2011.

<span class="mw-page-title-main">Michael Stonebraker</span> American computer scientist (born 1943)

Michael Ralph Stonebraker is an American computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

ParAccel, Inc. was a California-based software company.

Revolution Analytics is a statistical software company focused on developing open source and "open-core" versions of the free and open source software R for enterprise, academic and analytics customers. Revolution Analytics was founded in 2007 as REvolution Computing providing support and services for R in a model similar to Red Hat's approach with Linux in the 1990s as well as bolt-on additions for parallel processing. In 2009 the company received nine million in venture capital from Intel along with a private equity firm and named Norman H. Nie as their new CEO. In 2010 the company announced the name change as well as a change in focus. Their core product, Revolution R, would be offered free to academic users and their commercial software would focus on big data, large scale multiprocessor computing, and multi-core functionality.

<span class="mw-page-title-main">InfiniDB</span> Database management software company based in Frisco, Texas

InfiniDB was a database management software company based in Frisco, Texas. The company developed InfiniDB, a scalable, software-only columnar database management system for analytic applications.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Pivotal Software, Inc. was an American multinational software and services company based in San Francisco that provided cloud platform hosting and consulting services. Since November 2023, Pivotal has been part of Broadcom.

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

<span class="mw-page-title-main">DBeaver</span> Multi-platform database administration software

DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. It is written in Java and based on the Eclipse platform.

<span class="mw-page-title-main">Postgres-XL</span>

Postgres-XL is a distributed relational database management system (RDBMS) software based on PostgreSQL. It aims to provide feature parity with PostgreSQL while distributing the workload over a cluster. The name "Postgres-XL" stands for "eXtensible Lattice".

<span class="mw-page-title-main">Trino (SQL query engine)</span> Open-source distributed SQL query engine

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can query data lakes that contain open column-oriented data file formats like ORC or Parquet residing on different storage systems like HDFS, AWS S3, Google Cloud Storage, or Azure Blob Storage using the Hive and Iceberg table formats. Trino also has the ability to run federated queries that query tables in different data sources such as MySQL, PostgreSQL, Cassandra, Kafka, MongoDB and Elasticsearch. Trino is released under the Apache License.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References

  1. 1 2 "EMC to Acquire Greenplum". Press release. EMC Corporation. July 6, 2010. Retrieved March 15, 2017.
  2. Haranas, Mark. "5 Things You Need To Know About VMware's Acquisition Of Pivotal | CRN". www.crn.com. Retrieved 2024-10-02.
  3. "Chipmaker Broadcom completes $69bn deal to buy VMware". 2023-11-23. Retrieved 2024-06-05.
  4. "Form D: Notice of Sale of Securities" (PDF). US SEC. July 30, 2003. Retrieved March 15, 2017.
  5. Maureen O'Gara (September 26, 2003). "Metapa Buys Didera". Linux Business News. Retrieved March 15, 2017.
  6. "Metapa Acquires Didera and Closes Additional Funding; Industry Pioneers in High-Performance Computing Combine to Create Breakthrough Linux Database Clustering Solution for Decision Support". Press release. September 23, 2003.
  7. "Bizgres project launched". PostgreSQL developer's web site. April 17, 2005. Retrieved March 15, 2017.
  8. 1 2 Duncan Riley (January 21, 2008). "Greenplum Takes $27 Million Series C". Tech Crunch. Retrieved March 15, 2017.
  9. Colin White; Richard Hackathorn (June 26, 2007). "Sun/Greenplum". Business Intelligence Best Practices. Retrieved March 15, 2017.
  10. "History". Old Bizgres.org web site. Archived from the original on December 22, 2008. Retrieved March 15, 2017.
  11. "Greenplum Updates Open-Source Based Database". Information Week. February 22, 2008. Retrieved March 15, 2017.
  12. Om Malik (July 6, 2010). "Big Data = Big Money: EMC Buys Greenplum". GigaOm. Archived from the original on October 20, 2016. Retrieved March 15, 2017.
  13. Alexander Haislip (July 7, 2010). "Microsoft, Sun, And SAP Surprising Winners In Greenplum Sale". Forbes. Retrieved March 15, 2017.
  14. "ebay's two enormous data warehouses". DBMS2 blog. Monash Research. April 30, 2009. Retrieved March 15, 2017.
  15. Timothy Prickett Morgan (March 20, 2012). "EMC wants to be the Linux of big data: Opens up Chorus tool, borgs agile coders Pivotal Labs". The Register. Retrieved March 15, 2017.
  16. "When should I use Greenplum Database versus HAWQ?". Pivotal Guru web site. January 31, 2014. Retrieved March 15, 2017.
  17. Timothy Prickett Morgan (February 25, 2013). "EMC morphs Hadoop elephant into SQL database Hawq". The Register. Retrieved March 15, 2017.
  18. Cade Metz (February 17, 2015). "Pivotal Doubles Down on Open Source in a Sign of Changing Software World". Wired. Retrieved March 15, 2017.
  19. 1 2 Timothy Prickett Morgan (April 6, 2011). "EMC gets fat and flashy with Greenplum appliances: Take that, Teradata, Exadata, Netezza". The Register. Retrieved March 18, 2017.
  20. Sunila Gollapudi (2013). Getting Started with Greenplum for Big Data Analytics. Packt Publishing. ISBN   978-1-78217-705-0.
  21. "System Properties Comparison Amazon Redshift vs. Greenplum vs. Microsoft Azure SQL Database vs. Teradata Aster". DB-engines. Retrieved March 18, 2017.
  22. "VMware Greenplum 7.x Release Notes". 2 October 2023.
  23. "Greenplum 6 OLTP Benchmarks". 15 May 2019.
  24. "Pivotal's Greenplum database is about to finally align with the open source project. What will that mean for the platform?". ZDNet .
  25. "Substantial rev of the open source, MPP data warehouse offers high concurrency, embedded analytics, and data science capabilities". 7 November 2019.
  26. "Pivotal Greenplum is alive and kicking". ZDNet. Retrieved September 14, 2017.
  27. "Orca: A Modular Query Optimizer Architecture for Big Data" (PDF). ZDNet. Retrieved April 14, 2016.