Pivot table

Last updated

A pivot table is a table of values which are aggregations of groups of individual values from a more extensive table (such as from a database, spreadsheet, or business intelligence program) within one or more discrete categories. The aggregations or summaries of the groups of the individual terms might include sums, averages, counts, or other statistics. A pivot table is the outcome of the statistical processing of tabularized raw data and can be used for decision-making.

Contents

Although pivot table is a generic term, Microsoft held a trademark on the term in the United States from 1994 to 2020. [1]

History

In their book Pivot Table Data Crunching, [2] Bill Jelen and Mike Alexander refer to Pito Salas as the "father of pivot tables". While working on a concept for a new program that would eventually become Lotus Improv, Salas noted that spreadsheets have patterns of data. A tool that could help the user recognize these patterns would help to build advanced data models quickly. With Improv, users could define and store sets of categories, then change views by dragging category names with the mouse. This core functionality would provide the model for pivot tables.

Lotus Development released Improv in 1991 on the NeXT platform. A few months after the release of Improv, Brio Technology published a standalone Macintosh implementation, called DataPivot (with technology eventually patented in 1999). [3] Borland purchased the DataPivot technology in 1992 and implemented it in their own spreadsheet application, Quattro Pro.

In 1993 the Microsoft Windows version of Improv appeared. Early in 1994 Microsoft Excel  5 [4] brought a new functionality called a "PivotTable" to market. Microsoft further improved this feature in later versions of Excel:

In 2007 Oracle Corporation made PIVOT and UNPIVOT operators available in Oracle Database 11g. [5]

Mechanics

For typical data entry and storage, data usually appear in flat tables, meaning that they consist of only columns and rows, as in the following portion of a sample spreadsheet showing data on shirt types:

ABCDEFG
1RegionGenderStyleShip dateUnitsPriceCost
2EastBoyTee2005-01-311211.0410.42
3EastBoyGolf2005-01-311213.0012.60
4EastBoyFancy2005-01-311211.9611.74
5EastGirlTee2005-01-311011.2710.56
6EastGirlGolf2005-01-311012.1211.95
7EastGirlFancy2005-01-311013.7413.33
8WestBoyTee2005-01-311111.4410.94
9WestBoyGolf2005-01-311112.6311.73
10WestBoyFancy2005-01-311112.0611.51
11WestGirlTee2005-01-311513.4213.29
12WestGirlGolf2005-01-311511.4810.67

While tables such as these can contain many data items, it can be difficult to get summarized information from them. A pivot table can help quickly summarize the data and highlight the desired information. The usage of a pivot table is extremely broad and depends on the situation. The first question to ask is, "What am I seeking?" In the example here, let us ask, "How many Units did we sell in each Region for every Ship Date?":

Sum of unitsShip date ▼
Region ▼2005-01-312005-02-282005-03-312005-04-302005-05-312005-06-30
East6680102116127125
North96117138151154156
South123141157178191202
West7897117136150157
(blank)
Grand total363435514581622640

A pivot table usually consists of row, column and data (or fact) fields. In this case, the column is ship date, the row is region and the data we would like to see is (sum of) units. These fields allow several kinds of aggregations, including: sum, average, standard deviation, count, etc. In this case, the total number of units shipped is displayed here using a sum aggregation.

Implementation

Using the example above, the software will find all distinct values for Region. In this case, they are: North, South, East, West. Furthermore, it will find all distinct values for Ship date. Based on the aggregation type, sum, it will summarize the fact, the quantities of Unit, and display them in a multidimensional chart. In the example above, the first datum is 66. This number was obtained by finding all records where both Region was East and Ship Date was 2005-01-31, and adding the Units of that collection of records (i.e., cells E2 to E7) together to get a final result.

Pivot tables are not created automatically. For example, in Microsoft Excel one must first select the entire data in the original table and then go to the Insert tab and select "Pivot Table" (or "Pivot Chart"). The user then has the option of either inserting the pivot table into an existing sheet or creating a new sheet to house the pivot table. A pivot table field list is provided to the user which lists all the column headers present in the data. For instance, if a table represents sales data of a company, it might include Date of sale, Sales person, Item sold, Color of item, Units sold, Per unit price, and Total price. This makes the data more readily accessible.

Date of saleSales personItem soldColor of itemUnits soldPer unit priceTotal price
2013-10-01JonesNotebookBlack825000200000
2013-10-02PrinceLaptopRed435000140000
2013-10-03GeorgeMouseRed68505100
2013-10-04LarryNotebookWhite1027000270000
2013-10-05JonesMouseBlack47002800

The fields that would be created will be visible on the right hand side of the worksheet. By default, the pivot table layout design will appear below this list.

Pivot Table fields are the building blocks of pivot tables. Each of the fields from the list can be dragged on to this layout, which has four options:

  1. Filters
  2. Columns
  3. Rows
  4. Values

Some uses of pivot tables are related to the analysis of questionnaires with optional responses but some implementations of pivot tables do not allow these use cases. For example the implementation in LibreOffice Calc since 2012 is not able to process empty cells. [6] [7]

Filters

Report filter is used to apply a filter to an entire table. For example, if the "Color of Item" field is dragged to this area, then the table constructed will have a report filter inserted above the table. This report filter will have drop-down options (Black, Red, and White in the example above). When an option is chosen from this drop-down list ("Black" in this example), then the table that would be visible will contain only the data from those rows that have the "Color of Item= Black".

Columns

Column labels are used to apply a filter to one or more columns that have to be shown in the pivot table. For instance if the "Salesperson" field is dragged to this area, then the table constructed will have values from the column "Sales Person", i.e., one will have a number of columns equal to the number of "Salesperson". There will also be one added column of Total. In the example above, this instruction will create five columns in the table — one for each salesperson, and Grand Total. There will be a filter above the data — column labels — from which one can select or deselect a particular salesperson for the pivot table.

This table will not have any numerical values as no numerical field is selected but when it is selected, the values will automatically get updated in the column of "Grand total".

Rows

Row labels are used to apply a filter to one or more rows that have to be shown in the pivot table. For instance, if the "Salesperson" field is dragged on this area then the other output table constructed will have values from the column "Salesperson", i.e., one will have a number of rows equal to the number of "Sales Person". There will also be one added row of "Grand Total". In the example above, this instruction will create five rows in the table — one for each salesperson, and Grand Total. There will be a filter above the data — row labels — from which one can select or deselect a particular salesperson for the Pivot table.

This table will not have any numerical values, as no numerical field is selected, but when it is selected, the values will automatically get updated in the Row of "Grand Total".

Values

This usually takes a field that has numerical values that can be used for different types of calculations. However, using text values would also not be wrong; instead of Sum, it will give a count. So, in the example above, if the "Units sold" field is dragged to this area along with the row label of "Salesperson", then the instruction will add a new column, "Sum of units sold", which will have values against each salesperson.

Row labelsSum of units sold
Jones12
Prince4
George6
Larry10
Grand total32

Application support

Pivot tables or pivot functionality are an integral part of many spreadsheet applications and some database software, as well as being found in other data visualization tools and business intelligence packages.

Spreadsheets

Database support

Web applications

Programming languages and libraries

Programming languages and libraries suited to work with tabular data contain functions that allow the creation and manipulation of pivot tables.

Online analytical processing

Excel pivot tables include the feature to directly query an online analytical processing (OLAP) server for retrieving data instead of getting the data from an Excel spreadsheet. On this configuration, a pivot table is a simple client of an OLAP server. Excel's PivotTable not only allows for connecting to Microsoft's Analysis Service, but to any XML for Analysis (XMLA) OLAP standard-compliant server.

See also

Related Research Articles

<span class="mw-page-title-main">Spreadsheet</span> Computer application for organization, analysis, and storage of data in tabular form

A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. The term spreadsheet may also refer to one such electronic document.

<span class="mw-page-title-main">Lotus Improv</span> Spreadsheet program

Lotus Improv is a discontinued spreadsheet program from Lotus Development released in 1991 for the NeXTSTEP platform and then for Windows 3.1 in 1993. Development was put on hiatus in 1994 after slow sales on the Windows platform, and officially ended in April 1996 after Lotus was purchased by IBM.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

<span class="mw-page-title-main">Comma-separated values</span> File format used to store data

Comma-separated values (CSV) is a text file format that uses commas to separate values. A CSV file stores tabular data in plain text, where each line of the file typically represents one data record. Each record consists of the same number of fields, and these are separated by commas in the CSV file. If the field delimiter itself may appear within a field, fields can be surrounded with quotation marks.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

The SQL SELECT statement returns a result set of records, from one or more tables.

<span class="mw-page-title-main">OLAP cube</span> Multidimensional data array organized for rapid analysis

An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than three.

An SQL INSERT statement adds one or more records to any single table in a relational database.

Essbase is a multidimensional database management system (MDBMS) that provides a platform upon which to build analytic applications. Essbase began as a product from Arbor Software, which merged with Hyperion Software in 1998. Oracle Corporation acquired Hyperion Solutions Corporation in 2007. Until late 2005 IBM also marketed an OEM version of Essbase as DB2 OLAP Server.

<span class="mw-page-title-main">Javelin Software</span> Software extending the spreadsheet paradigm

Javelin Software Corporation (1984–1988) was a company in Cambridge, Massachusetts, USA, which developed an innovative modeling and data analysis product, also called Javelin, and later Javelin Plus. Seen as the successor technology to spreadsheet software in reviews of the time, and rival to the then-dominant Lotus 1-2-3, Javelin won numerous industry awards, including beating Microsoft's new Excel for the InfoWorld Software Product of the Year award.

Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulae.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

<span class="mw-page-title-main">Aggregate function</span> Type of function in database management

In database management, an aggregate function or aggregation function is a function where multiple values are processed together to form a single summary statistic.

A GROUP BY statement in SQL specifies that a SQL SELECT statement partitions result rows into groups, based on their values in one or several columns. Typically, grouping is used to apply some sort of aggregate function for each group.

<span class="mw-page-title-main">Numbers (spreadsheet)</span> Spreadsheet application by Apple Inc.

Numbers is a spreadsheet application developed by Apple Inc. as part of the iWork productivity suite alongside Keynote and Pages. Numbers is available for iOS and macOS High Sierra or newer. Numbers 1.0 on Mac OS X was announced on August 7, 2007, making it the newest application in the iWork suite. The iPad version was released on January 27, 2010. The app was later updated to support iPhone and iPod Touch.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

<span class="mw-page-title-main">SOFA Statistics</span>

SOFA Statistics is an open-source statistical package. The name stands for Statistics Open For All. It has a graphical user interface and can connect directly to MySQL, PostgreSQL, SQLite, MS Access (map), and Microsoft SQL Server. Data can also be imported from CSV and Tab-Separated files or spreadsheets. The main statistical tests available are Independent and Paired t-tests, Wilcoxon signed ranks, Mann–Whitney U, Pearson's chi squared, Kruskal Wallis H, one-way ANOVA, Spearman's R, and Pearson's R. Nested tables can be produced with row and column percentages, totals, standard deviation, mean, median, lower and upper quartiles, and sum.

<span class="mw-page-title-main">XLeratorDB</span>

XLeratorDB is a suite of database function libraries that enable Microsoft SQL Server to perform a wide range of additional (non-native) business intelligence and ad hoc analytics. The libraries, which are embedded and run centrally on the database, include more than 450 individual functions similar to those found in Microsoft Excel spreadsheets. The individual functions are grouped and sold as six separate libraries based on usage: finance, statistics, math, engineering, unit conversions and strings. WestClinTech, the company that developed XLeratorDB, claims it is "the first commercial function package add-in for Microsoft SQL Server."

<span class="mw-page-title-main">LibreOffice Calc</span> Spreadsheet component of LibreOffice

LibreOffice Calc is the spreadsheet component of the LibreOffice software package.

Data Analysis Expressions (DAX) is the native formula and query language for Microsoft PowerPivot, Power BI Desktop and SQL Server Analysis Services (SSAS) Tabular models. DAX includes some of the functions that are used in Excel formulas with additional functions that are designed to work with relational data and perform dynamic aggregation. It is, in part, an evolution of the Multidimensional Expression (MDX) language developed by Microsoft for Analysis Services multidimensional models combined with Excel formula functions. It is designed to be simple and easy to learn, while exposing the power and flexibility of PowerPivot and SSAS tabular models.

References

  1. "United States Trademark Serial Number 74472929". 1994-12-27. Retrieved 2022-03-23.
  2. Jelen, Bill; Alexander, Michael (2006). Pivot table data crunching . Indianapolis: Que. pp.  274. ISBN   0-7897-3435-4.
  3. Gartung, Daniel L.; Edholm, Yorgen H.; Edholm, Kay-Martin; McNall, Kristen N.; Lew, Karl M., Patent #5915257 , retrieved 2010-02-16
  4. Darlington, Keith (2012-08-06). VBA For Excel Made Simple. Routledge (published 2012). p. 19. ISBN   9781136349775 . Retrieved 2014-09-10. [...] Excel 5, released in early 1994, included the first version of VBA.
  5. Shah, Sharanam; Shah, Vaishali (2008). Oracle for Professionals - Covers Oracle 9i, 10g and 11g. Shroff Publishing Series. Navi Mumbai: Shroff Publishers (published July 2008). p. 549. ISBN   9788184045260 . Retrieved 2014-09-10. One of the most useful new features of the Oracle Database 11g from the SQL perspective is the introduction of Pivot and Unpivot operators.
  6. "LibreOffice Calc and Pivot table with empty cells". StackOverflow . 2021-06-17. Retrieved 2021-06-17.
  7. "Functionality request for PIVOTTABLE". LibreOffice bugs. 2012-03-19. Retrieved 2021-06-17.
  8. Dalgleish, Debra (2007). Beginning PivotTables in Excel 2007: From Novice to Professional. Apress. pp. 233–257. ISBN   9781430204336 . Retrieved 18 September 2018.
  9. "Busy Developers' Guide to HSSF and XSSF Features". poi.apache.org. Retrieved 2022-12-09.
  10. "Pivot Tables".
  11. "Create & use pivot tables". Docs Editors Help. Google Inc. Retrieved 6 August 2020.
  12. "iWork update brings major changes to Mac, iPhone, and iPad apps". Macworld. Retrieved 2021-09-28.
  13. "PostgreSQL: Documentation: 9.2: tablefunc". postgresql.org. 9 November 2017.
  14. "CONNECT Table Types - PIVOT Table Type". mariadb.com.
  15. "FROM clause plus JOIN, APPLY, PIVOT (T-SQL) - SQL Server".
  16. "pandas.pivot_table" . Retrieved 21 November 2023.
  17. dplyr and Pivot Tables.
  18. Pivoting.
  19. "pivottabler".

Further reading