XSLT/Muenchian grouping

Last updated
Basic elements and process flow of XSLT XSLT en.svg
Basic elements and process flow of XSLT

Muenchian grouping (or Muenchian method, named after Steve Muench) is an algorithm for grouping of data used in XSL Transformations v1 that identifies keys in the results and then queries all nodes with that key. This improves the traditional alternative for grouping, whereby each node is checked against previous (or following) nodes to determine if the key is unique (if it is, this would indicate a new group). [1] In both cases the key can take the form of an attribute, element, or computed value.

The unique identifier is referred to as a key because of the use of the 'key' function to identify and track the group variable.

The technique is not necessary in XSLT 2.0+, which introduces the new for-each-group tag.

General aspect of the transform

The method took advantage of XSLT's ability to index documents using a key. The trick involves using the index to efficiently figure out the set of unique grouping keys and then using this set to process all nodes in the group: [2]

<xsl:keyname="products-by-category"match="product"use="@category"/><xsl:templatematch="/"><xsl:for-eachselect="//product[count(. | key('products-by-category', @category)[1]) = 1]"><xsl:variablename="current-grouping-key"select="@category"/><xsl:variablename="current-group"select="key('products-by-category',                              $current-grouping-key)"/><xsl:for-eachselect="$current-group"><!-- processing for elements in group --><!-- you can use xsl:sort here also, if necessary --></xsl:for-each></xsl:for-each></xsl:template>

Although the Muenchian method will continue to work in 2.0, for-each-group Is preferred as it is likely to be as efficient and probably more so. The Muenchian method can only be used for value-based grouping.

Related Research Articles

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

XSL-FO is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL, a set of W3C technologies designed for the transformation and formatting of XML data. The other parts of XSL are XSLT and XPath. Version 1.1 of XSL-FO was published in 2006.

Saxon is an XSLT and XQuery processor created by Michael Kay and now developed and maintained by his company, Saxonica. There are open-source and also closed-source commercial versions. Versions exist for Java, JavaScript and .NET.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

In computer programming, a sigil is a symbol affixed to a variable name, showing the variable's datatype or scope, usually a prefix, as in $foo, where $ is the sigil.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It is also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

<span class="mw-page-title-main">Backjumping</span> In backtracking algorithms, technique that reduces search space

In backtracking algorithms, backjumping is a technique that reduces search space, therefore increasing efficiency. While backtracking always goes up one level in the search tree when all values for a variable have been tested, backjumping may go up more levels. In this article, a fixed order of evaluation of variables is used, but the same considerations apply to a dynamic order of evaluation.

The identity transform is a data transformation that copies the source data into the destination data without change.

<span class="mw-page-title-main">Oxygen XML Editor</span>

The Oxygen XML Editor is a multi-platform XML editor, XSLT/XQuery debugger and profiler with Unicode support. It is a Java application so it can run in Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.

XSLT defines many elements to describe the transformations that should be applied to a document. This article lists some of these elements. For an introduction to XSLT, see the main article.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

Diazo, previously named xdv, is a general-purpose, open source website theming tool. It is written in Python and generates XSLT. Diazo creates a separation between theme pages and transformation rules, allowing web designers to work on templates in plain HTML, without knowledge of XSLT or special template-related codes.

Zorba is an open source query processor written in C++, implementing

Fusion adaptive resonance theory (fusion ART) is a generalization of self-organizing neural networks known as the original Adaptive Resonance Theory models for learning recognition categories across multiple pattern channels. There is a separate stream of work on fusion ARTMAP, that extends fuzzy ARTMAP consisting of two fuzzy ART modules connected by an inter-ART map field to an extended architecture consisting of multiple ART modules.

References

  1. Grouping using the Muenchian Method, Jeni Tennison
  2. "Recipe 6.2. Prefer for-each-group over Muenchian Method of Grouping".