Wide and narrow data

Last updated

Wide and narrow (sometimes un-stacked and stacked, or wide and tall) are terms used to describe two different presentations for tabular data. [1] [2]

Contents

Wide

Wide, or unstacked data is presented with each different data variable in a separate column.

PersonAgeWeightHeight
Bob32168180
Alice24150175
Steve64144165

Narrow

Narrow, stacked, or long data is presented with one column containing all the values and another column listing the context of the value

PersonVariableValue
BobAge32
BobWeight168
BobHeight180
AliceAge24
AliceWeight150
AliceHeight175
SteveAge64
SteveWeight144
SteveHeight165

This is often easier to implement; addition of a new field does not require any changes to the structure of the table, however it can be harder for people to understand.

Implementations

Many statistical and data processing systems have functions to convert between these two presentations, for instance the R programming language has several packages such as the tidyr package. The pandas package in Python implements this operation as "melt" function which converts a wide table to a narrow one. The process of converting a narrow table to wide table is generally referred to as "pivoting" in the context of data transformations. The "pandas" python package provides a "pivot" method which provides for a narrow to wide transformation.

See also

Related Research Articles

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on reoccurring schedules either as single jobs or aggregated into a batch of jobs.

<span class="mw-page-title-main">Table (information)</span> Arrangement of information or data, typically in rows and columns

A table is an arrangement of information or data, typically in rows and columns, or possibly in a more complex structure. Tables are widely used in communication, research, and data analysis. Tables appear in print media, handwritten notes, computer software, architectural ornamentation, traffic signs, and many other places. The precise conventions and terminology for describing tables vary depending on the context. Further, tables differ significantly in variety, structure, flexibility, notation, representation and use. Information or data conveyed in table form is said to be in tabular format. In books and technical articles, tables are typically presented apart from the main text in numbered and captioned floating blocks.

XSL-FO is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL, a set of W3C technologies designed for the transformation and formatting of XML data. The other parts of XSL are XSLT and XPath. Version 1.1 of XSL-FO was published in 2006.

<span class="mw-page-title-main">LAMP (software bundle)</span> Acronym for a common web hosting solution

LAMP is an acronym denoting one of the most common software stacks for many of the web's most popular applications. However, LAMP now refers to a generic software stack model and its components are largely interchangeable.

A pivot table is a table of grouped values that aggregates the individual items of a more extensive table within one or more discrete categories. This summary might include sums, averages, or other statistics, which the pivot table groups together using a chosen aggregation function applied to the grouped values.

Microsoft SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks.

Data cleansing or data cleaning is the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting or a data quality firewall.

In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

Entity–attribute–value model (EAV) is a data model to encode, in a space-efficient manner, entities where the number of attributes that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. Such entities correspond to the mathematical notion of a sparse matrix.

<span class="mw-page-title-main">IPython</span> Advanced interactive shell for Python

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.


The Wing Python IDE family of integrated development environments (IDEs) from Wingware was created specifically for the Python programming language, with support for editing, testing, debugging, inspecting/browsing, and error checking Python code.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

<span class="mw-page-title-main">Yesod (web framework)</span>

Yesod is a free and open-source web framework based on Haskell for productive development of type-safe, REST model based, high performance web applications, developed by Michael Snoyman et al.

pandas (software) Python library for data analysis

pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. Its name is a play on the phrase "Python data analysis" itself. Wes McKinney started building what would become pandas at AQR Capital while he was a researcher there from 2007 to 2010.

References

  1. Thompson, M. E. (1997), Theory of sample surveys, Chapman & Hall, London. ISBN   0-412-31780-X
  2. Chantala, K. (2006) "Using STATA to Analyze data from a Sample Survey". 1-10-2001. UNC Chapel Hill, Carolina Population Center. 10-1-2006.