OLAP cube

Last updated January 17, 2023

An OLAP cube is a multi-dimensional array of data.^[1] Online analytical processing (OLAP)^[2] is a computer-based technique of analyzing data to look for insights. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than three.

Terminology

A cube can be considered a multi-dimensional generalization of a two- or three-dimensional spreadsheet. For example, a company might wish to summarize financial data by product, by time-period, and by city to compare actual and budget expenses. Product, time, city and scenario (actual and budget) are the data's dimensions.^[3]

Cube is a shorthand for multidimensional dataset, given that data can have an arbitrary number of dimensions . The term hypercube is sometimes used, especially for data with more than three dimensions. A cube is not a "cube" in the strict mathematical sense, as the sides are not all necessarily equal. But this term is used widely.

A Slice is a term for a subset of the data, generated by picking a value for one dimension and only showing the data for that value (for instance only the data at one point in time). Spreadsheets are only 2-dimensional, so by (continued) slicing or other techniques, it becomes possible to visualise multidimensional data in them.

Each cell of the cube holds a number that represents some measure of the business, such as sales, profits, expenses, budget and forecast.

OLAP data is typically stored in a star schema or snowflake schema in a relational data warehouse or in a special-purpose data management system. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

Hierarchy

The elements of a dimension can be organized as a hierarchy,^[4] a set of parent-child relationships, typically where a parent member summarizes its children. Parent elements can further be aggregated as the children of another parent.^[5]

For example, May 2005's parent is Second Quarter 2005 which is in turn the child of Year 2005. Similarly cities are the children of regions; products roll into product groups and individual expense items into types of expenditure.

Operations

Conceiving data as a cube with hierarchical dimensions leads to conceptually straightforward operations to facilitate analysis. Aligning the data content with a familiar visualization enhances analyst learning and productivity.^[5] The user-initiated process of navigating by calling for page displays interactively, through the specification of slices via rotations and drill down/up is sometimes called "slice and dice". Common operations include slice and dice, drill down, roll up, and pivot.

Slice is the act of picking a rectangular subset of a cube by choosing a single value for one of its dimensions, creating a new cube with one fewer dimension.^[5] The picture shows a slicing operation: The sales figures of all sales regions and all product categories of the company in the year 2005 and 2006 are "sliced" out of the data cube.

Dice: The dice operation produces a subcube by allowing the analyst to pick specific values of multiple dimensions.^[6] The picture shows a dicing operation: The new cube shows the sales figures of a limited number of product categories, the time and region dimensions cover the same range as before.

Drill Down/Up allows the user to navigate among levels of data ranging from the most summarized (up) to the most detailed (down).^[5] The picture shows a drill-down operation: The analyst moves from the summary category "Outdoor protective equipment" to see the sales figures for the individual products.

Roll-up: A roll-up involves summarizing the data along a dimension. The summarization rule might be an aggregate function, such as computing totals along a hierarchy or applying a set of formulas such as "profit = sales - expenses".^[5] General aggregation functions may be costly to compute when rolling up: if they cannot be determined from the cells of the cube, they must be computed from the base data, either computing them online (slow) or precomputing them for possible rollouts (large space). Aggregation functions that can be determined from the cells are known as decomposable aggregation functions, and allow efficient computation.^[7] For example, it is easy to support COUNT, MAX, MIN, and SUM in OLAP, since these can be computed for each cell of the OLAP cube and then rolled up, since on overall sum (or count etc.) is the sum of sub-sums, but it is difficult to support MEDIAN, as that must be computed for every view separately: the median of a set is not the median of medians of subsets.

Pivot allows an analyst to rotate the cube in space to see its various faces. For example, cities could be arranged vertically and products horizontally while viewing data for a particular quarter. Pivoting could replace products with time periods to see data across time for a single product.^[5]^[8]

The picture shows a pivoting operation: The whole cube is rotated, giving another perspective on the data.

Mathematical definition

In database theory, an OLAP cube is^[9] an abstract representation of a projection of an RDBMS relation. Given a relation of order N, consider a projection that subtends X, Y, and Z as the key and W as the residual attribute. Characterizing this as a function,

f : (X,Y,Z) → W,

the attributes X, Y, and Z correspond to the axes of the cube, while the W value corresponds to the data element that populates each cell of the cube.

Insofar as two-dimensional output devices cannot readily characterize three dimensions, it is more practical to project "slices" of the data cube (we say project in the classic vector analytic sense of dimensional reduction, not in the SQL sense, although the two are conceptually similar),

g : (X,Y) → W

which may suppress a primary key, but still have some semantic significance, perhaps a slice of the triadic functional representation for a given Z value of interest.

The motivation^[9] behind OLAP displays harks back to the cross-tabbed report paradigm of 1980s DBMS, and to earlier contingency tables from 1904. The result is a spreadsheet-style display, where values of X populate row $1; values of Y populate column $A; and values of g : ( X, Y ) → W populate the individual cells at intersections of X-labeled columns and Y-labeled rows, "southeast", so to speak, of $B$2, with $B$2 itself included.

Related Research Articles

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

<span class="mw-page-title-main">Star schema</span> Data warehousing schema

In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries.

In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional data set. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional data set. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set.

Essbase is a multidimensional database management system (MDBMS) that provides a platform upon which to build analytic applications. Essbase began as a product from Arbor Software, which merged with Hyperion Software in 1998. Oracle Corporation acquired Hyperion Solutions Corporation in 2007, as of 2009 Oracle markets Essbase as "Oracle Essbase", both on-premises and in Oracle's Cloud Infrastructure (OCI). Until late 2005 IBM also marketed an OEM version of Essbase as DB2 OLAP Server.

In computer programming contexts, a data cube is a multi-dimensional ("n-D") array of values. Typically, the term data cube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data.

A pivot table is a table of grouped values that aggregates the individual items of a more extensive table within one or more discrete categories. This summary might include sums, averages, or other statistics, which the pivot table groups together using a chosen aggregation function applied to the grouped values.

Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulas.

Data drilling refers to any of various operations and transformations on tabular, relational, and multidimensional data. The term has widespread use in various contexts, but is primarily associated with specialized software designed specifically for data analysis.

Microsoft SQL Server Analysis Services (SSAS) is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server. SSAS is used as a tool by organizations to analyze and make sense of information possibly spread out across multiple databases, or in disparate tables or files. Microsoft has included a number of services in SQL Server related to business intelligence and data warehousing. These services include Integration Services, Reporting Services and Analysis Services. Analysis Services includes a group of OLAP and data mining capabilities and comes in two flavors multidimensional and tabular, where the difference between the two is how the data is presented. In a tabular model, the information is arranged in two-dimensional tables which can thus be more readable for a human. A multidimensional model can contain information with many degrees of freedom, and must be unfolded to increase readability by a human.

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

In database management, an aggregate function or aggregation function is a function where the values of multiple rows are grouped together to form a single summary value.

Palo is a memory resident multidimensional database server and typically used as a business intelligence tool for controlling and budgeting purposes with spreadsheet software acting as the user interface. Beyond the multidimensional data concept, Palo enables multiple users to share one centralised data storage.

<span class="mw-page-title-main">XLeratorDB</span>

XLeratorDB is a suite of database function libraries that enable Microsoft SQL Server to perform a wide range of additional (non-native) business intelligence and ad hoc analytics. The libraries, which are embedded and run centrally on the database, include more than 450 individual functions similar to those found in Microsoft Excel spreadsheets. The individual functions are grouped and sold as six separate libraries based on usage: finance, statistics, math, engineering, unit conversions and strings. WestClinTech, the company that developed XLeratorDB, claims it is "the first commercial function package add-in for Microsoft SQL Server."

XLCubed is a business intelligence software and consulting services company. Established in 2001, XLCubed develops business intelligence software and provides business intelligence and performance management consulting services. The company is privately held and based out of the United Kingdom in the Thames Valley IT corridor.

The dimensional fact model (DFM) is an ad hoc and graphical formalism specifically devised to support the conceptual modeling phase in a data warehouse project. DFM is extremely intuitive and can be used by analysts and non-technical users as well. A short-term working is sufficient to realize a clear and exhaustive representation of multidimensional concepts. It can be used from the initial data warehouse life-cycle steps, to rapidly devise a conceptual model to share with customers.

Power Pivot, formerly known as PowerPivot, is a feature of Microsoft Excel, a computer software spreadsheet. It is available as an add-in in Excel 2010, 2013 in separate downloads, and as an add-in included with the Excel 2016 program. Power Pivot extends a local instance of Microsoft Analysis Services tabular that is embedded directly into an Excel Workbook. This allows a user to build a ROLAP model in Power Pivot, and use pivot tables to explore the model once it is built. This allows Excel to act as a self-service business intelligence (BI) platform, implementing professional expression languages to query the model and calculate advanced measures.

Array database management systems provide database services specifically for arrays, that is: homogeneous collections of data items, sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.

Data Analysis Expressions (DAX) is the native formula and query language for Microsoft PowerPivot, Power BI Desktop and SQL Server Analysis Services (SSAS) Tabular models. DAX includes some of the functions that are used in Excel formulas with additional functions that are designed to work with relational data and perform dynamic aggregation. It is, in part, an evolution of the Multidimensional Expression (MDX) language developed by Microsoft for Analysis Services multidimensional models combined with Excel formula functions. It is designed to be simple and easy to learn, while exposing the power and flexibility of PowerPivot and SSAS tabular models.

Cubes is a light-weight open source multidimensional modelling and OLAP toolkit for development reporting applications and browsing of aggregated data written in Python programming language released under the MIT License.

The functional database model is used to support analytics applications such as financial planning and performance management. The functional database model, or the functional model for short, is different from but complementary to the relational model. The functional model is also distinct from other similarly named concepts, including the DAPLEX functional database model and functional language databases.

References

↑ Gray, Jim; Bosworth, Adam; Layman, Andrew; Pirahesh, Hamid (1996). "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals". Proceedings of the International Conference on Data Engineering (ICDE). pp. 152–159. arXiv: cs/0701155 . doi:10.1109/ICDE.1996.492099.
↑ "Overview of Online Analytical Processing (OLAP)". support.office.com. Retrieved 2018-09-08.
↑ "Cybertec releases OLAP cubes for PostgreSQL". PostgreSQL. 2006-10-02. Archived from the original on 2013-06-30. Retrieved 2008-03-05.
↑ "Oracle9i Data Warehousing Guide hierarchy". Lorentz Center. Retrieved 2008-03-05.
1 2 3 4 5 6 "OLAP and OLAP Server Definitions". The OLAP Council. 1995. Retrieved 2008-03-18.
↑ "Glossary of Data Mining Terms". University of Alberta. 1999. Retrieved 2008-03-17.
↑ Zhang 2017, p. 1.
↑ "Computer Encyclopedia: multidimensional views". Answers.com. Retrieved 2008-03-05.
1 2 Gray, Jim; Bosworth, Adam; Layman, Andrew; Priahesh, Hamid (1995-11-18). "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals". Proc. 12th International Conference on Data Engineering. IEEE. pp. 152–159. Retrieved 2008-11-09.

Zhang, Chao (2017). Symmetric and Asymmetric Aggregate Function in Massively Parallel Computing (Technical report).

External links

Daniel Lemire (December 2007). "Data Warehousing and OLAP - A Research-Oriented Bibliography". Archived from the original on 2013-07-06. Retrieved 2008-03-05.
The RDF Data Cube Vocabulary
Microsoft Azure: Online analytical processing (OLAP)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Gray, Jim; Bosworth, Adam; Layman, Andrew; Pirahesh, Hamid (1996). "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals". Proceedings of the International Conference on Data Engineering (ICDE). pp. 152–159. arXiv: cs/0701155 . doi:10.1109/ICDE.1996.492099.

[2] "Overview of Online Analytical Processing (OLAP)". support.office.com. Retrieved 2018-09-08.

[3] "Cybertec releases OLAP cubes for PostgreSQL". PostgreSQL. 2006-10-02. Archived from the original on 2013-06-30. Retrieved 2008-03-05.

[4] "Oracle9i Data Warehousing Guide hierarchy". Lorentz Center. Retrieved 2008-03-05.

[OLAPGlossary1995-5] 1 2 3 4 5 6 "OLAP and OLAP Server Definitions". The OLAP Council. 1995. Retrieved 2008-03-18.

[6] "Glossary of Data Mining Terms". University of Alberta. 1999. Retrieved 2008-03-17.

[FOOTNOTEZhang20171-7] Zhang 2017, p. 1.

[8] "Computer Encyclopedia: multidimensional views". Answers.com. Retrieved 2008-03-05.

[DataCubeGray1995-9] 1 2 Gray, Jim; Bosworth, Adam; Layman, Andrew; Priahesh, Hamid (1995-11-18). "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals". Proc. 12th International Conference on Data Engineering. IEEE. pp. 152–159. Retrieved 2008-11-09.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]