XML pipeline

Last updated

In software, an XML pipeline is formed when XML (Extensible Markup Language) processes, especially XML transformations and XML validations, are connected.

Contents

For instance, given two transformations T1 and T2, the two can be connected so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2. Simple pipelines like the one described above are called linear; a single input document always goes through the same sequence of transformations to produce a single output document.

Linear operations

Linear operations can be divided in at least two parts

Micro-operations

They operate at the inner document level

Document operations

They take the input document as a whole

Sequence operations

They are mainly introduced in XProc and help to handle the sequence of document as a whole

Non-linear

Non-linear operations on pipelines may include:

Some standards also categorize transformation as macro (changes impacting an entire file) or micro (impacting only an element or attribute)

XML pipeline languages

XML pipeline languages are used to define pipelines. A program written with an XML pipeline language is implemented by software known as an XML pipeline engine, which creates processes, connects them together and finally executes the pipeline. Existing XML pipeline languages include:

Standards

Product-specific

Pipe granularity

Different XML Pipeline implementations support different granularity of flow.

Standardization

Until May 2010, there was no widely used standard for XML pipeline languages. However, with the introduction of the W3C XProc standard as a W3C Recommendation as of May 2010, [6] widespread adoption can be expected.

History

See also

Related Research Articles

In computing, the term Extensible Stylesheet Language (XSL) is used to refer to a family of languages used to transform and render XML documents.

XSLT is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. XSLT 1.0 is widely supported in modern web browsers.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath.

XSL-FO is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL, a set of W3C technologies designed for the transformation and formatting of XML data. The other parts of XSL are XSLT and XPath. Version 1.1 of XSL-FO was published in 2006.

Apache Cocoon, usually abbreviated as Cocoon, is a web application framework built around the concepts of Pipeline, separation of concerns, and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language. Cocoon's use of XML is intended to improve compatibility of publishing formats, such as HTML and PDF. The content management systems Apache Lenya and Daisy have been created on top of the framework. Cocoon is also commonly used as a data warehousing ETL tool or as middleware for transporting data between systems.

Pipeline (Unix)

In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process (stdout) is passed directly as input (stdin) to the next one. The second process is started as the first process is still executing, and they are executed concurrently. The concept of pipelines was championed by Douglas McIlroy at Unix's ancestral home of Bell Labs, during the development of Unix, shaping its toolbox philosophy. It is named by analogy to a physical pipeline. A key feature of these pipelines is their "hiding of internals". This in turn allows for more clarity and simplicity in the system.

In software engineering, a pipeline consists of a chain of processing elements, arranged so that the output of each element is the input of the next; the name is by analogy to a physical pipeline. Usually some amount of buffering is provided between consecutive elements. The information that flows in these pipelines is often a stream of records, bytes, or bits, and the elements of a pipeline may be called filters; this is also called the pipes and filters design pattern. Connecting elements into a pipeline is analogous to function composition.

GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to allow for other implementations as well. It became a Recommendation on September 11, 2007.

The identity transform is a data transformation that copies the source data into the destination data without change.

In computing, the two primary stylesheet languages are Cascading Style Sheets (CSS) and the Extensible Stylesheet Language (XSL). While they are both called stylesheet languages, they have very different purposes and ways of going about their tasks.

Oxygen XML Editor

The Oxygen XML Editor is a multi-platform XML editor, XSLT/XQuery debugger and profiler with Unicode support. It is a Java application, so it can run in Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.

Sandcastle is a documentation generator from Microsoft. It automatically produces MSDN-style code documentation out of reflection information of .NET assemblies and XML documentation comments found in the source code of these assemblies. It can also be used to produce user documentation from Microsoft Assistance Markup Language (MAML) with the same look and feel as reference documentation.

XSLT defines many elements to describe the transformations that should be applied to a document. This article lists some of these elements. For an introduction to XSLT, see the main article.

XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.

XMLStarlet is a set of command line utilities (toolkit) to query, transform, validate, and edit XML documents and files using a simple set of shell commands in a way similar to how it is done with UNIX grep, sed, awk, diff, patch, join, etc commands.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

XML transformation language Type of programming language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

Tritium is a simple scripting language for efficiently transforming structured data like HTML, XML, and JSON. It is similar in purpose to XSLT but has a syntax influenced by jQuery, Sass, and CSS versus XSLT's XML based syntax.

References

  1. "XProc: An XML Pipeline Language". W3.org. Retrieved 2013-06-14.
  2. "W3C XML Pipeline Definition Language".
  3. "XML Pipeline Language (XPL) Version 1.0 (Draft)". W3.org. Retrieved 2013-06-14.
  4. "XML Pipeline Definition Language Version 1.0". W3.org. 2002-02-28. Retrieved 2013-06-14.
  5. "XML pipelines: XPL and XProc". Orbeon. 22 May 2007. Retrieved 14 March 2012.
  6. "XProc: An XML Pipeline Language". W3.org. Retrieved 2013-06-14.
  7. "Early Unix history and evolution". Cm.bell-labs.com. Archived from the original on April 8, 2015. Retrieved 2013-06-14.
  8. "FAQ". Xpipe.sourceforge.net. 2001-12-09. Retrieved 2013-06-14.

Standards

Recommendations

Working drafts

Product specific