Power Query

Last updated
The Power Query user interface, as shown in the Power Query Online experience. Power Query User Interface.png
The Power Query user interface, as shown in the Power Query Online experience.

Power Query is an ETL tool created by Microsoft for data extraction, loading and transformation, and is used to retrieve data from sources, process it, and load them into one or more target systems. Power Query is available in several variations within the Microsoft Power Platform, and is used for business intelligence on fully or partially self-service platforms. It is found in software such as Excel, Power BI, Analysis Services, Dataverse, [1] Power Apps, Azure Data Factory, SSIS, Dynamics 365, and in cloud services such as Microsoft Dataflows, [2] including Power BI Dataflow used with the online Power BI Service or the somewhat more generic version of Microsoft Dataflow used with Power Automate.

Contents

ETL is closely related to data modeling, [3] and for transformation, Power Query can be used to develop a logical data model in those cases where the data does not already have one, or where there is a need to further develop the data model.

History

Power Query was firstly included as an optional additional import feature in Excel besides PowerPivot (used for providing a data model to the PivotTables pivot tables, and more) in Excel 2010 and 2013. In Excel 2016, the function was included in standard Excel and renamed Get & Transform for a short time, but has since been named Power Query again.

M Formula language

Power Query is built on what was then[ when? ] a new query language called M. It is a mashup language (hence the letter M) designed to create queries that mix together data. It is similar to the F Sharp programming language, and according to Microsoft it is a "mostly pure, higher-order, dynamically typed, partially lazy, functional language." The M language is case-sensitive.

Much of the user interaction with Power Query can be done via graphical user interfaces with wizards, and this can be used for many common or basic tasks. It is also possible to use the advanced editing mode where the developer can write in the M formula language; this gives greater expressive power, more possibilities, and can also be used to change the code generated by the graphical wizards.

Let expression

User queries are typically written with a top level let expression. The let expression contains a list of comma-separated named reference (variables) bindings and an in expression which is what the let expression evaluates to. The in expression can reference the variables and the variables can reference each other. Backwards and forward referencing is allowed, and self-referencing is allowed by prefixing the @ on the variable. Variables are recursively evaluated as needed to evaluate the in expression. No variable is evaluated more than once.

Examples

leta="Hello",b="World",result=a&" "&binresult

Assertions and Datatypes

Variables are not typed in Power Query. Instead, an expression can have a type assertion which will evaluate to an error when the expression does not evaluate to a value compatible with the assertion. Assertions can be preceded by nullable to include null in the allowed values.

Datatypes/Assertions
NameDescriptionDatatypeAssertion
numberAssertion for integer and floating-point numbersDark Red x.svg NoGreen check.svg Yes
intSigned 32-bit integerGreen check.svg YesDark Red x.svg No
longSigned 64-bit integerGreen check.svg YesDark Red x.svg No
doubleIEEE 754 floatGreen check.svg YesDark Red x.svg No
decimal128-bit float. Same as C#'s decimalGreen check.svg YesDark Red x.svg No
timeTime of dayGreen check.svg YesGreen check.svg Yes
dateA calendar date ranging from 1 CE to 9999 CE in the Georgian CalendarGreen check.svg YesGreen check.svg Yes
datetimeA composite of the date and time datatypesGreen check.svg YesGreen check.svg Yes
durationA measurement of elapsed time (can be negative)Green check.svg YesGreen check.svg Yes
logicalRepresents a Boolean true or false valueGreen check.svg YesGreen check.svg Yes
textA Unicode stringGreen check.svg YesGreen check.svg Yes
guidA Globally Unique Identifier (Converts to a text as needed automatically)Green check.svg YesDark Red x.svg No
listAn ordered list of valuesGreen check.svg YesGreen check.svg Yes
recordAn ordered map from text to any valueGreen check.svg YesGreen check.svg Yes
tableA 2D matrix where each column has a unique name and type (type not checked on table contents)Green check.svg YesGreen check.svg Yes
functionA power query functionGreen check.svg YesGreen check.svg Yes
typeRepresents a datatype and may contain assertion informationGreen check.svg YesGreen check.svg Yes
actionAn internally used datatypeGreen check.svg YesGreen check.svg Yes
nullThe null singletonGreen check.svg YesGreen check.svg Yes
anyRepresents all valuesDark Red x.svg NoGreen check.svg Yes
anynonnullRepresents all values except nullDark Red x.svg NoGreen check.svg Yes
noneRepresents no values and always fails as an assertionDark Red x.svg NoGreen check.svg Yes
errorA pseudo value representing an errorDark Red x.svg NoDark Red x.svg No

Comments

Power Query supports the C block (/* ... */) and C line (// ...) comments.

DirectQuery

In Power BI, use of M-code is somewhat limited in DirectQuery, as opposed to Import which has all capabilities. This is due to the requirement that M-code in DirectQuery has to be translated into SQL at runtime.

Query Folding

Query Folding is the ability for the Power Query steps to be transpiled into a single query at the data source (for example in Transact SQL). As such, Query Folding works like a traditional ETL process, and enables working on the data before loading. Query Folding is not always supported. Steps like filtering, selecting columns and simple SQL arithmetic are supported. Steps like creating index and appending or merging non foldable sources with foldable sources are not. Folding indicators (such as folding, not folding, might fold, opaque, unknown) might indicate up to which step a query might fold. Non-folding queries will have to be performed on the client-side. The order of queries can determine how many of the steps which get folded.

See also

Related Research Articles

<span class="mw-page-title-main">Microsoft Access</span> Database manager part of the Microsoft 365 package

Microsoft Access is a database management system (DBMS) from Microsoft that combines the relational Access Database Engine (ACE) with a graphical user interface and software-development tools. It is a member of the Microsoft 365 suite of applications, included in the Professional and higher editions or sold separately.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

In computing, online analytical processing, or OLAP, is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

Extract, transform, load (ETL) is a three-phase computing process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collected from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

<span class="mw-page-title-main">Visual programming language</span> Programming language written graphically by a user

In computing, a visual programming language, also known as diagrammatic programming, graphical programming or block coding, is a programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually. A VPL allows programming with visual expressions, spatial arrangements of text and graphic symbols, used either as elements of syntax or secondary notation. For example, many VPLs are based on the idea of "boxes and arrows", where boxes or other screen objects are treated as entities, connected by arrows, lines or arcs which represent relations. VPLs are generally the basis of low-code development platforms.

Essbase is a multidimensional database management system (MDBMS) that provides a platform upon which to build analytic applications. Essbase began as a product from Arbor Software, which merged with Hyperion Software in 1998. Oracle Corporation acquired Hyperion Solutions Corporation in 2007. Until late 2005 IBM also marketed an OEM version of Essbase as DB2 OLAP Server.

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. UDFs are usually written for the requirement of its creator.

A pivot table is a table of values which are aggregations of groups of individual values from a more extensive table within one or more discrete categories. The aggregations or summaries of the groups of the individual terms might include sums, averages, counts, or other statistics. A pivot table is the outcome of the statistical processing of tabularized raw data and can be used for decision-making.

Microsoft SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

The null coalescing operator is a binary operator that is part of the syntax for a basic conditional expression in several programming languages, such as : C# since version 2.0, Dart since version 1.12.0, PHP since version 7.0.0, Perl since version 5.10 as logical defined-or, PowerShell since 7.0.0, and Swift as nil-coalescing operator.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Business Intelligence Development Studio (BIDS) is the former IDE from Microsoft, and was used to develop data analysis and business intelligence solutions utilizing Microsoft SQL Server Analysis Services, Reporting Services and Integration Services. It is based on the Microsoft Visual Studio development environment, but customized with the SQL Server services-specific extensions and project types, including tools, controls and projects for reports, ETL dataflows, OLAP cubes and data mining structure.

Power Pivot, formerly known as PowerPivot, is a self-service business intelligence feature of Microsoft Excel which facilitates the creation of a tabular model to import, relate, and analyze data from a variety of sources.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">Array DBMS</span> System that provides database services specifically for arrays

An array database management system or array DBMS provides database services specifically for arrays, that is: homogeneous collections of data items, sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.

<span class="mw-page-title-main">BIDS Helper</span>

BIDS Helper is a Visual Studio open source extension with multiple features that extend and enhance business intelligence development functionality in all editions of Microsoft's SQL Server 2005, 2008, 2008 R2 and 2012. BIDS Helper improves the development environment for integration, analysis and reporting services. BIDS Helper is hosted on GitHub.

Data Analysis Expressions (DAX) is the native formula and query language for Microsoft PowerPivot, Power BI Desktop and SQL Server Analysis Services (SSAS) Tabular models. DAX includes some of the functions that are used in Excel formulas with additional functions that are designed to work with relational data and perform dynamic aggregation. It is, in part, an evolution of the Multidimensional Expression (MDX) language developed by Microsoft for Analysis Services multidimensional models combined with Excel formula functions. It is designed to be simple and easy to learn, while exposing the power and flexibility of PowerPivot and SSAS tabular models.

Microsoft Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence (BI). It is part of the Microsoft Power Platform. Power BI is a collection of software services, apps, and connectors that work together to turn various sources of data into static and interactive data visualizations. Data may be input by reading directly from a database, webpage, PDF, or structured files such as spreadsheets, CSV, XML, JSON, XLSX, and SharePoint.

References

  1. DougKlopfenstein. "Power Query documentation - Power Query" . Retrieved 2022-10-27.
  2. ptyx507x. "What is Power Query? - Power Query" . Retrieved 2022-10-27.{{cite web}}: CS1 maint: numeric names: authors list (link)
  3. Dearmer, Abe. "Why ETL Data Modeling is Critical in 2021" . Retrieved 2022-10-27.