Data warehouse appliance

Last updated

In computing, the term data warehouse appliance (DWA) was coined by Foster Hinshaw [1] [2] for a computer architecture for data warehouses (DW) specifically marketed for big data analysis and discovery that is simple to use (not a pre-configuration) and has a high performance for the workload. A DWA includes an integrated set of servers, storage, operating systems, and databases.

Contents

In marketing, the term evolved to include pre-installed and pre-optimized hardware and software as well as similar software-only systems [3] promoted as easy to install on specific recommended hardware configurations or preconfigured as a complete system. [4] [5] These are marketing uses of the term and do not reflect the technical definition.

A DWA is designed specifically for high performance big data analytics and is delivered as an easy-to-use packaged system. DW appliances are marketed for data volumes in the terabyte to petabyte range.

Technology

The data warehouse appliance (DWA) has several characteristics which differentiate that architecture from similar machines in a data center, such as an enterprise data warehouse (EDW).

  1. A DWA has a very tight integration of its internal components which are optimized for "data-centric" operations in contrast to "compute-centric" operations. The latter tend to emphasize number of CPU's, cores and network bandwidth.
  1. A DWA is trivial to use and install. In contrast to a "pre-configuration" of components, a DWA has very few configuration switches or options. The elimination of such options significantly reduces configuration error – the number one cause for failure in large systems.
  1. A DWA is optimized for analytics on big data. In contrast, preceding architectures (including parallel ones) focused on "enterprise data warehouse" being a general-purpose repository for data and supporting analytics as an ancillary task.

Most DW appliances use massively parallel processing (MPP) architectures to provide high query performance and platform scalability. MPP architectures consist of independent processors or servers executing in parallel. Most MPP architectures implement a "shared-nothing architecture" where each server operates self-sufficiently and controls its own memory and disk. DW appliances distribute data onto dedicated disk storage units connected to each server in the appliance. This distribution allows DW appliances to resolve a relational query by scanning data on each server in parallel. The divide-and-conquer approach delivers high performance and scales linearly as new servers are added into the architecture.

History

"Data warehouse appliance" is a term coined by Foster Hinshaw, [1] [2] the founder of Netezza. In creating the first data warehouse appliance, Hinshaw and Netezza used the foundations developed by Model 204, Teradata, and others, to pioneer a new category to address consumer analytics efficiently by providing a modular, scalable, easy-to-manage database system that’s cost effective.

MPP database architectures have a long pedigree. Some consider Teradata's initial product as the first DW appliance — or Britton-Lee's. [6] [7] Teradata acquired Britton Lee — renamed ShareBase — in June, 1990. [8] Others disagree, considering appliances as a "disruptive technology" for Teradata [9]

Additional vendors, including Tandem Computers, and Sequent Computer Systems also offered MPP architectures in the 1980s. Open source and commodity computing components aided a re-emergence of MPP data warehouse appliances. Advances in technology reduced costs and improved performance in storage devices, multi-core CPUs and networking components. Open-source RDBMS products, such as Ingres and PostgreSQL, reduce software-license costs and allow DW-appliance vendors to focus on optimization rather than providing basic database functionality. Open-source Linux became a common operating system for DW appliances.

Other DW appliance vendors use specialized hardware and advanced software, instead of MPP architectures. [10] Netezza announced a "data appliance" in 2003, and used specialized field-programmable gate array hardware. [11] Kickfire followed in 2008 with what they called a dataflow "sql chip". [12] [ citation needed ]

In 2009 more DW appliances emerged. IBM integrated its InfoSphere warehouse (formerly DB2 Warehouse) with its own servers and storage to create the IBM InfoSphere Balanced Warehouse. Netezza introduced its TwinFin platform based on commodity IBM hardware. Other DW appliance vendors have also partnered with major hardware vendors. DATAllegro, prior to acquisition by Microsoft, partnered with EMC Corporation and Dell and implemented open-source Ingres on Linux. Greenplum had a partnership with Sun Microsystems and implements Greenplum Database (based on PostgreSQL) on Solaris using the ZFS file system. HP Neoview uses HP NonStop SQL.

The market has also seen the emergence of data-warehouse bundles where vendors combine their hardware and database software together as a data warehouse platform. The Oracle Optimized Warehouse Initiative combines the Oracle Database with hardware from various computer manufacturers (Dell, EMC, HP, IBM, SGI and Sun Microsystems). Oracle's Optimized Warehouses offer pre-validated configurations and the database software comes pre-installed. In September 2008 Oracle began offering a more classic appliance offering, the HP Oracle Database Machine, a jointly developed and co-branded platform that Oracle sold and supported and HP built in configurations specifically for Oracle. [13] [14] In September 2009, Oracle released a second-generation Exadata system, based on their acquired Sun Microsystems hardware. [15]

See also

Related Research Articles

<span class="mw-page-title-main">Oracle Corporation</span> American multinational computer corporation

Oracle Corporation is an American multinational computer technology company headquartered in Austin, Texas, United States. In 2020, Oracle was the third-largest software company in the world by revenue and market capitalization. In 2023, the company’s seat in Forbes Global 2000 was 80. The company sells database software and cloud computing. Oracle's core application software is a suite of enterprise software products, such as enterprise resource planning (ERP) software, human capital management (HCM) software, customer relationship management (CRM) software, enterprise performance management (EPM) software, Customer Experience Commerce(CX Commerce) and supply chain management (SCM) software.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node in a computer cluster. The intent is to eliminate contention among nodes. Nodes do not share the same memory or storage.

Oracle Database is a proprietary multi-model database management system produced and marketed by Oracle Corporation.

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

The IBM Data Warehousing Balanced Configuration Unit is a family of data warehousing servers from IBM. IBM introduced the Balanced Configuration Unit (BCU) for AIX in 2005, and the BCU for Linux in 2006. The BCU is a "balanced" combination of computer server hardware combined with DB2 Data Warehouse Edition software to form a data warehouse "appliance like" system to compete with systems such as Greenplum, DATAllegro, Netezza Performance Server, and Teradata.

Dataupia was a supplier of data warehouse appliances. Dataupia focuses on data warehousing for applications running on Oracle, Microsoft SQL Server databases. Dataupia's Satori Server included server computers, storage, and software.

<span class="mw-page-title-main">Britton Lee, Inc.</span> American relational database company

Britton Lee Inc. was a pioneering relational database company. Renamed ShareBase, it was acquired by Teradata in June, 1990.

<span class="mw-page-title-main">Netezza</span> Provider of Integrated Data Warehouse Hardware and Software

IBM Netezza is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.

<span class="mw-page-title-main">Greenplum</span>

Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired by EMC Corporation in July 2010.

<span class="mw-page-title-main">Oracle Exadata</span>

The Oracle ExadataDatabase Machine (Exadata) is a computing platform optimized for running Oracle Databases.

In-database processing, sometimes referred to as in-database analytics, refers to the integration of data analytics into data warehousing functionality. Today, many large databases, such as those used for credit card fraud detection and investment bank risk management, use this technology because it provides significant performance improvements over traditional methods.

Exalogic is a computer appliance made by Oracle Corporation, commercially available since 2010. It is a cluster of x86-64-servers running Oracle Linux or Solaris preinstalled.

The term is used for two different things:

  1. In computer science, in-memory processing (PIM) is a computer architecture in which data operations are available directly on the data memory, rather than having to be transferred to CPU registers first. This may improve the power usage and performance of moving data between the processor and the main memory.
  2. In software engineering, in-memory processing is a software architecture where a database is kept entirely in random-access memory (RAM) or flash memory so that usual accesses, in particular read or query operations, do not require access to disk storage. This may allow faster data operations such as "joins", and faster reporting and decision-making in business.

The Oracle data appliance consists of hardware and software from Oracle Corporation sold as a computer appliance. It was announced in 2011,and is used for the consolidating and loading unstructured data into Oracle Database software. Larry Ellison founded of Oracle.

HP ConvergedSystem is a portfolio of system-based products from Hewlett-Packard (HP) that integrates preconfigured IT components into systems for virtualization, cloud computing, big data, collaboration, converged management, and client virtualization. Composed of servers, storage, networking, and integrated software and services, the systems are designed to address the cost and complexity of data center operations and maintenance by pulling the IT components together into a single resource pool so they are easier to manage and faster to deploy. Where previously it would take three to six months from the time of order to get a system up and running, it now reportedly takes as few as 20 days with the HP ConvergedSystem.

The Oracle Database Appliance (ODA) is a database server appliance made by Oracle Corporation. It was introduced in September 2011 as the mid-market offering in Oracle's family of full-stack, integrated systems the company calls engineered systems. The ODA is a single rack-mounted device providing a highly-available two-node clustered database server.

A Block Range Index or BRIN is a database indexing technique. They are intended to improve performance with extremely large tables.

References

  1. 1 2 "Introducing 'data warehouse appliances' - Infostor.com®". May 18, 2007.
  2. 1 2 Swoyer, By Stephen (2007-05-23). "Still Another Data Warehouse Appliance Is Coming!". TDWI.
  3. "Queries From Hell blog » When is an appliance not an appliance?".
  4. "Data warehouse appliances – fact and fiction | DBMS 2 : DataBase Management System Services".
  5. Omer Trajman, Alain Crolotte, David Steinhoff, Raghunath Nambiar, Meikel Poess: Database Are Not Toasters: A Framework for Comparing Data Warehouse Appliances
  6. Kobielus, James (April 22, 2008). "Teradata Goes Appliance, Officially". Archived from the original on September 29, 2011. Retrieved 2011-01-14. Teradata effectively established the DW appliance market a quarter-century ago when it rolled out the first in a long line of preconfigured, preoptimized solutions that combine CPUs, storage, software, and database to address the most demanding analytical and decision support requirements
  7. "Database machines and data warehouse appliances – the early days". Monash Research. September 15, 2008. Retrieved 2011-01-15. But for all practical purposes, the first two significant "database machine" vendors were Britton-Lee and Teradata. And since Britton-Lee eventually sold out to Teradata (after a brief name change to ShareBase), Teradata is entitled to whatever historical glory accrues from having innovated the database management appliance category.
  8. Todd White (November 5, 1990). "Teradata Corp. suffers first quarterly loss in four years". Los Angeles Business Journal. Retrieved 2008-07-14.
  9. All, Ann (Apr 6, 2007). "Will a Data Warehouse Appliance Work for You?" . Retrieved 2011-01-14. DATAllegro has a site at Sears. Sears uses [the appliance] as a front end to their Teradata warehouse to calculate aggregates. So when they want to do slice-and-dice, how many we sold in which stores and of what color, they use the appliance...I think [appliances] could be a disruptive technology for Teradata
  10. "TPC-H - Top Ten Price/Performance Results". www.tpc.org. Archived from the original on 2020-04-23.
  11. "Netezza Performance Server (NPS™) 8000 Series". Product web page. Netezza. Archived from the original on February 3, 2004. Retrieved August 16, 2013.
  12. "Kickfire". Archived from the original on 2009-05-24. Retrieved 2009-07-18.
  13. "Oracle Exadata Storage Server. Part I." September 24, 2008.
  14. "Oracle Exadata - What is the benefit?". Archived from the original on 2008-11-20. Retrieved 2008-11-19.
  15. Alex Gorbachev (September 15, 2009). "Unveiling the OLTP Oracle Database Machine & Exadata v2". Blog. Pythian. Retrieved August 16, 2013.