Identity transform

Last updated

The identity transform is a data transformation that copies the source data into the destination data without change.

Contents

The identity transformation is considered an essential process in creating a reusable transformation library. By creating a library of variations of the base identity transformation, a variety of data transformation filters can be easily maintained. These filters can be chained together in a format similar to UNIX shell pipes.

Examples of recursive transforms

The "copy with recursion" permits, changing little portions of code, produce entire new and different output, filtering or updating the input. Understanding the "identity by recursion" we can understand the filters.

Using XSLT

The most frequently cited example of the identity transform (for XSLT version 1.0) is the "copy.xsl" transform as expressed in XSLT. This transformation uses the xsl:copy command [1] to perform the identity transformation:

<xsl:stylesheetversion="1.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:templatematch="@*|node()"><xsl:copy><xsl:apply-templatesselect="@*|node()"/></xsl:copy></xsl:template></xsl:stylesheet>

This template works by matching all attributes (@*) and other nodes (node()), copying each node matched, then applying the identity transformation to all attributes and child nodes of the context node. This recursively descends the element tree and outputs all structures in the same structure they were found in the original file, within the limitations of what information is considered significant in the XPath data model. Since node() matches text, processing instructions, root, and comments, as well as elements, all XML nodes are copied.

A more explicit version of the identity transform is:

<xsl:stylesheetversion="1.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:templatematch="@*|*|processing-instruction()|comment()"><xsl:copy><xsl:apply-templatesselect="*|@*|text()|processing-instruction()|comment()"/></xsl:copy></xsl:template></xsl:stylesheet>

This version is equivalent to the first, but explicitly enumerates the types of XML nodes that it will copy. Both versions copy data that is unnecessary for most XML usage (e.g., comments).

XSLT 3.0

XSLT 3.0 [2] specifies an on-no-match attribute of the xsl:mode instruction that allows the identity transform to be declared rather than implemented as an explicit template rule. Specifically:

<xsl:stylesheetversion="3.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:modeon-no-match="shallow-copy"/></xsl:stylesheet>

is essentially equivalent to the earlier template rules. See the XSLT 3.0 standard's description of shallow-copy [3] for details.

Finally, note that markup details, such as the use of CDATA sections or the order of attributes, are not necessarily preserved in the output, since this information is not part of the XPath data model. To show CDATA markup in the output, the XSLT stylesheet that contains the identity transform template (not the identity transform template itself) should make use of the xsl:output attribute called cdata-section-elements.

cdata-section-elements specifies a list of the names of elements whose text node children should be output using CDATA sections. [1] For example:

<xsl:outputmethod="xml"encoding="utf-8"cdata-section-elements="element-name-1 element-name-2"/>

Using XQuery

XQuery can define recursive functions. The following example XQuery function copies the input directly to the output without modification.

declarefunctionlocal:copy($elementaselement()){element{node-name($element)}{$element/@*,for$childin$element/node()returnif($childinstanceofelement())thenlocal:copy($child)else$child}};

The same function can also be achieved using a typeswitch-style transform.

xqueryversion"1.0";(: copy the input to the output without modification :)declarefunctionlocal:copy($inputasitem()*)asitem()*{for$nodein$inputreturntypeswitch($node)casedocument-node()returndocument{local:copy($node/node())}caseelement()returnelement{name($node)}{(: output each attribute in this element :)for$attin$node/@*returnattribute{name($att)}{$att},(: output all the sub-elements of this element recursively :)for$childin$nodereturnlocal:copy($child/node())}(: otherwise pass it through.  Used for text(), comments, and PIs :)defaultreturn$node};

The typeswitch transform is sometime preferable since it can easily be modified by simply adding a case statement for any element that needs special processing.

Non-recursive transforms

Two simple and illustrative "copy all" transforms.

Using XSLT

<xsl:stylesheetversion="1.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:templatematch="/"><xsl:copy-ofselect="."/></xsl:template></xsl:stylesheet>

Using XProc

<p:pipelinename="pipeline"xmlns:p="http://www.w3.org/ns/xproc"><p:identity/></p:pipeline>

Here one important note about the XProc identity, is that it can take either one document like this example or a sequence of document as input.

More complex examples

Generally the identity transform is used as a base on which one can make local modifications.

Remove named element transform

Using XSLT

The identity transformation can be modified to copy everything from an input tree to an output tree except a given node. For example, the following will copy everything from the input to the output except the social security number:

<xsl:templatematch="@*|node()"><xsl:copy><xsl:apply-templatesselect="@*|node()"/></xsl:copy></xsl:template><!-- remove all social security numbers --><xsl:templatematch="PersonSSNID"/>

Using XQuery

declarefunctionlocal:copy-filter-elements($elementaselement(),$element-nameasxs:string*)aselement(){element{node-name($element)}{$element/@*,for$childin$element/node()[not(name(.)=$element-name)]returnif($childinstanceofelement())thenlocal:copy-filter-elements($child,$element-name)else$child}};

To call this one would add:

$filtered-output:=local:copy-filter-elements($input,'PersonSSNID')

Using XProc

<p:pipelinename="pipeline"xmlns:p="http://www.w3.org/ns/xproc"><p:identity/><p:deletematch="PersonSSNID"/></p:pipeline>

See also

Further reading

Related Research Articles

XML Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

In computing, the term Extensible Stylesheet Language (XSL) is used to refer to a family of languages used to transform and render XML documents.

XSLT is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. XSLT 1.0 is widely supported in modern web browsers.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

In software, an XML pipeline is formed when XML processes, especially XML transformations and XML validations, are connected.

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath.

Apache Cocoon, usually abbreviated as Cocoon, is a web application framework built around the concepts of Pipeline, separation of concerns, and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language. The flexibility afforded by relying heavily on XML allows rapid content publishing in a variety of formats including HTML, PDF, and WML. The content management systems Apache Lenya and Daisy have been created on top of the framework. Cocoon is also commonly used as a data warehousing ETL tool or as middleware for transporting data between systems.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

Oxygen XML Editor

The Oxygen XML Editor is a multi-platform XML editor, XSLT/XQuery debugger and profiler with Unicode support. It is a Java application, so it can run in Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.

XSLT defines many elements to describe the transformations that should be applied to a document. This article lists some of these elements. For an introduction to XSLT, see the main article.

The DocBook XSL stylesheets are a set of XSLT stylesheets for the XML-based DocBook language.

XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

A Processing Instruction (PI) is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application.

XML transformation language Type of programming language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

Diazo, previously named xdv, is a general-purpose, open source website theming tool. It is written in Python and generates XSLT. Diazo creates a separation between theme pages and transformation rules, allowing web designers to work on templates in plain HTML, without knowledge of XSLT or special template-related codes.

Tritium is a simple scripting language for efficiently transforming structured data like HTML, XML, and JSON. It is similar in purpose to XSLT but has a syntax influenced by jQuery, Sass, and CSS versus XSLT's XML based syntax.

References

  1. 1 2 W3.org - XSL Transformations Version 1.0 - Copying
  2. W3.org - XSL Transformations Version 3.0
  3. W3.org - XSL Transformations Version 3.0 - Built-in Templates: Shallow Copy