Power Query

Last updated

Power Query is an ETL tool created by Microsoft for data extraction, loading and transformation, and is used to retrieve data from sources, process it, and load them into one or more target systems. Power Query is available in several variations within the Microsoft Power Platform, and is used for business intelligence on fully or partially self-service platforms. It is found in software such as Excel, Power BI, Analysis Services, Dataverse, [1] Power Apps, Azure Data Factory, SSIS, Dynamics 365, and in cloud services such as Microsoft Dataflows, [2] including Power BI Dataflow used with the online Power BI Service or the somewhat more generic version of Microsoft Dataflow used with Power Automate.

Contents

ETL is closely related to data modeling, [3] and for transformation, Power Query can be used to develop a logical data model in those cases where the data does not already have one, or where there is a need to further develop the data model.

History

Power Query was included as an additional feature in Power Pivot (used for making pivot tables, and more) in Excel 2010 and 2013. In Excel 2016, the function was renamed Get & Transform for a short time, but has since been named Power Query again.

M Formula language

Power Query is built on what was then[ when? ] a new query language called M. It is a mashup language (hence the letter M) designed to create queries that mix together data. It is similar to the F Sharp programming language, and according to Microsoft it is a "mostly pure, higher-order, dynamically typed, partially lazy, functional language." The M language is case-sensitive.

Much of the user interaction with Power Query can be done via graphical user interfaces with wizards, and this can be used for many common or basic tasks. It is also possible to use the advanced editing mode where the developer can write in the M formula language; this gives greater expressive power, more possibilities, and can also be used to change the code generated by the graphical wizards.

Let function

A Power Query written in M consists of a let expression where data and expressions can be referenced between each other, and an in expression which contains the output. Lines in the let expression (which are also called steps) are separated by a comma at the end of each line, except for the last line. Comments are written in C-style, with inline comments beginning with a forward slash (/) symbol, while block comments which can span multiple lines are delimited by /* and */. The M language is powerful and is sometimes necessary for performing certain forms of queries, but for many common tasks, however, users can also often interact with Power Query through the graphical user interfaces without directly working with the M language.

DirectQuery

In Power BI, use of M-code is somewhat limited in DirectQuery, as opposed to Import which has all capabilities. This is due to the requirement that M-code in DirectQuery has to be translated into SQL at runtime.

Query Folding

Query Folding is the ability for a Power Query to be transpiled into a single query at the data source (for example in Transact SQL). As such, Query Folding works like a traditional ETL process, and enables working on the data before loading. Query Folding is not always supported. Folding indicators (such as folding, not folding, might fold, opaque, unknown) might indicate up to which step a query might fold. Non-folding queries will have to be performed on the client-side. The order of queries can determine how many of the steps which get folded.

See also

Related Research Articles

<span class="mw-page-title-main">Microsoft Access</span> Database manager part of the Microsoft 365 package

Microsoft Access is a database management system (DBMS) from Microsoft that combines the relational Access Database Engine (ACE) with a graphical user interface and software-development tools. It is a member of the Microsoft 365 suite of applications, included in the Professional and higher editions or sold separately.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on reccurring schedules either as single jobs or aggregated into a batch of jobs.

<span class="mw-page-title-main">Visual programming language</span> Programming language written graphically by a user

In computing, a visual programming language, also known as diagrammatic programming, graphical programming or block coding, is a programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually. A VPL allows programming with visual expressions, spatial arrangements of text and graphic symbols, used either as elements of syntax or secondary notation. For example, many VPLs are based on the idea of "boxes and arrows", where boxes or other screen objects are treated as entities, connected by arrows, lines or arcs which represent relations. VPLs are generally the basis of Low-code development platforms.

In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming.

<span class="mw-page-title-main">Data Transformation Services</span> Microsoft database tool

Data Transformation Services (DTS) is a Microsoft database tool with a set of objects and utilities to allow the automation of extract, transform and load operations to or from a database. The objects are DTS packages and their components, and the utilities are called DTS tools. DTS was included with earlier versions of Microsoft SQL Server, and was almost always used with SQL Server databases, although it could be used independently with other databases.

A pivot table is a table of values which are aggregations of groups of individual values from a more extensive table within one or more discrete categories. The aggregations or summaries of the groups of the individual terms might include sums, averages, counts, or other statistics. A pivot table is the outcome of the statistical processing of tabularized raw data and can be used for decision-making.

Microsoft SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks.

SQL Server Reporting Services (SSRS) is a server-based report generating software system from Microsoft. It is part of a suite of Microsoft SQL Server services, including SSAS and SSIS.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. It allows the use of SQL or another scripting language suitable for the data source to perform required transformations. Scriptella does not offer any graphical user interface.

The null coalescing operator is a binary operator that is part of the syntax for a basic conditional expression in several programming languages, such as : C# since version 2.0, Dart since version 1.12.0, PHP since version 7.0.0., Perl since version 5.10 as logical defined-or, PowerShell since 7.0.0, and Swift as nil-coalescing operator.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Business Intelligence Development Studio (BIDS) is the former IDE from Microsoft, and was used to develop data analysis and business intelligence solutions utilizing Microsoft SQL Server Analysis Services, Reporting Services and Integration Services. It is based on the Microsoft Visual Studio development environment, but customized with the SQL Server services-specific extensions and project types, including tools, controls and projects for reports, ETL dataflows, OLAP cubes and data mining structure.

Innovative Routines International (IRI), Inc. is an American software company first known for bringing mainframe sort merge functionality into open systems. IRI was the first vendor to develop a commercial replacement for the Unix sort command, and combine data transformation and reporting in Unix batch processing environments. In 2007, IRI's coroutine sort ("CoSort") became the first product to collate and convert multi-gigabyte XML and LDIF files, join and lookup across multiple files, and apply role-based data privacy functions for fields within sensitive files.

Power Pivot, formerly known as PowerPivot, is a feature of Microsoft Excel, a computer software spreadsheet. It is available as an add-in in Excel 2010, 2013 in separate downloads, and as an add-in included with the Excel 2016 program. Power Pivot extends a local instance of Microsoft Analysis Services tabular that is embedded directly into an Excel Workbook. This allows a user to build a ROLAP model in Power Pivot, and use pivot tables to explore the model once it is built. This allows Excel to act as a self-service business intelligence (BI) platform, implementing professional expression languages to query the model and calculate advanced measures.

<span class="mw-page-title-main">BIDS Helper</span>

BIDS Helper is a Visual Studio open source extension with multiple features that extend and enhance business intelligence development functionality in all editions of Microsoft's SQL Server 2005, 2008, 2008 R2 and 2012. BIDS Helper improves the development environment for integration, analysis and reporting services. BIDS Helper is hosted on GitHub.

Data Analysis Expressions (DAX) is the native formula and query language for Microsoft PowerPivot, Power BI Desktop and SQL Server Analysis Services (SSAS) Tabular models. DAX includes some of the functions that are used in Excel formulas with additional functions that are designed to work with relational data and perform dynamic aggregation. It is, in part, an evolution of the Multidimensional Expression (MDX) language developed by Microsoft for Analysis Services multidimensional models combined with Excel formula functions. It is designed to be simple and easy to learn, while exposing the power and flexibility of PowerPivot and SSAS tabular models.

Microsoft Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence. It is part of the Microsoft Power Platform. Power BI is a collection of software services, apps, and connectors that work together to turn various sources of data into static and interactive data visualizations. Data may be input by reading directly from a database, webpage, PDF, or structured files such as spreadsheets, CSV, XML, JSON, XLSX, and SharePoint.

<span class="mw-page-title-main">Microsoft Power Platform</span> Family of business software products

Microsoft Power Platform is a line of business intelligence, app development, and app connectivity software applications. Microsoft developed the Power Fx low-code programming language for expressing logic across the Power Platform. It also provides integrations with GitHub and Microsoft Teams among other apps.

References

  1. DougKlopfenstein. "Power Query documentation - Power Query" . Retrieved 2022-10-27.
  2. ptyx507x. "What is Power Query? - Power Query" . Retrieved 2022-10-27.{{cite web}}: CS1 maint: numeric names: authors list (link)
  3. Dearmer, Abe. "Why ETL Data Modeling is Critical in 2021" . Retrieved 2022-10-27.