Streaming Transformations for XML

Last updated

Streaming Transformations for XML (STX) is an XML transformation language intended as a high-speed, low memory consumption alternative to XSLT version 1.0 and 2.0. Current work on XSLT 3.0 includes Streaming capabilities.

Contents

Overview

STX is an XML standard for efficient processing of stream-based XML. Basic XSLT is not well suited to stream based processing, and STX fills this niche.

Conventional XML processing involves loading the entire XML document into memory for use. In contrast, SAX streams XML events such as "open element," "close element," and "text node" so that other software can begin interpreting information immediately, before the end of the file is reached. Unfortunately some software can't effectively use XML fragments this way and must build up the whole document to begin processing. So is the case with XSLT. Because XSLT's XPath can select any node throughout the document it must have the entire document available in memory.

STX only allows queries immediately surrounding the current node so it can quickly start transforming and outputting SAX event nodes as they arrive. As it can discard nodes immediately after processing, memory use is significantly lower than that of XSLT. Having a limited query scope is a defining characteristic of STX.

This architectural decision intentionally marginalises STX as a niche language. Indeed, it would be wrong to say that STX is a general purpose transformation language; however, if your transformation needs can be met by STX then it's an efficient and smart choice.

Specifications

STX's query language is called STXPath and is based on XPath 2.0.

Implementations of STX are available in Java and Perl.

Similar projects

Unlike STX which is declared using an XML syntax, these two projects associate SAX events with callback functions:

Related Research Articles

XML Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XSLT is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. XSLT 1.0 is widely supported in modern web browsers.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

SAX is an event-driven online algorithm for parsing XML documents, with an API developed by the XML-DEV mailing list. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole—building the full abstract syntax tree of an XML document for convenience of the user—SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making a single pass through the input stream.

In software, an XML pipeline is formed when XML processes, especially XML transformations and XML validations, are connected.

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath.

Saxon is an XSLT and XQuery processor created by Michael Kay and now developed and maintained by his company, Saxonica. There are open-source and also closed-source commercial versions. Versions exist for Java, JavaScript and .NET.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

eXist-db is an open source software project for NoSQL databases built on XML technology. It is classified as both a NoSQL document-oriented database system and a native XML database. Unlike most relational database management systems (RDBMS) and NoSQL databases, eXist-db provides XQuery and XSLT as its query and application programming languages.

The identity transform is a data transformation that copies the source data into the destination data without change.

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

Oxygen XML Editor

The Oxygen XML Editor is a multi-platform XML editor, XSLT/XQuery debugger and profiler with Unicode support. It is a Java application, so it can run in Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.

XMLStarlet is a set of command line utilities (toolkit) to query, transform, validate, and edit XML documents and files using a simple set of shell commands in a way similar to how it is done with UNIX grep, sed, awk, diff, patch, join, etc commands.

XPath is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:

XQuery API for Java

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

XML transformation language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

srcML is a document-oriented XML representation of source code. It was created in a collaborative effort between Michael L. Collard and Jonathan I. Maletic. The abbreviation, srcML, is short for Source Markup Language. srcML wraps source code (text) with information from the Abstract Syntax Tree or AST (tags) into a single XML document. All original text is preserved so that the original source code document can be recreated from the srcML markup. The only exception is the possibility of newline normalization.

Tritium is a simple scripting language for efficiently transforming structured data like HTML, XML, and JSON. It is similar in purpose to XSLT but has a syntax influenced by jQuery, Sass, and CSS versus XSLT's XML based syntax.