XQuery

Last updated
XQuery
Paradigm declarative, functional, modular
Designed by W3C
First appeared2007
Stable release
3.1 / March 21, 2017;5 years ago (2017-03-21) [1]
Typing discipline dynamic or static, [2] [3] strong
OS Cross-platform
Filename extensions .xq, .xql, .xqm, .xqy, .xquery
Website www.w3.org/XML/Query/
Major implementations
Many
Influenced by
XPath, SQL, XSLT

XQuery (XML Query) is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, binary, etc.). The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

Contents

XQuery 1.0 became a W3C Recommendation on January 23, 2007. [4]

XQuery 3.0 became a W3C Recommendation on April 8, 2014. [5]

XQuery 3.1 became a W3C Recommendation on March 21, 2017. [6]

"The mission of the XML Query project is to provide flexible query facilities to extract data from real and virtual documents on the World Wide Web, therefore finally providing the needed interaction between the Web world and the database world. Ultimately, collections of XML files will be accessed like databases." [7]

Features

XQuery is a functional, side effect-free, expression-oriented programming language with a simple type system, summed up by Kilpeläinen: [8]

All XQuery expressions operate on sequences, and evaluate to sequences. Sequences are ordered lists of items. Items can be either nodes, which represent components of XML documents, or atomic values, which are instances of XML Schema base types like xs:integer or xs:string. Sequences can also be empty, or consist of a single item only. No distinction is made between a single item and a singleton sequence. (...) XQuery/XPath sequences differ from lists in languages like Lisp and Prolog by excluding nested sequences. Designers of XQuery may have considered nested sequences unnecessary for the manipulation of document contents. Nesting, or hierarchy of document structures is instead represented by nodes and their child-parent relationships

XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases [9] or office documents.

XQuery contains a superset of XPath expression syntax to address specific parts of an XML document. It supplements this with a SQL-like "FLWOR expression" for performing joins. A FLWOR expression is constructed from the five clauses after which it is named: FOR, LET, WHERE, ORDER BY, RETURN.

The language also provides syntax allowing new XML documents to be constructed. Where the element and attribute names are known in advance, an XML-like syntax can be used; in other cases, expressions referred to as dynamic node constructors are available. All these constructs are defined as expressions within the language, and can be arbitrarily nested.

The language is based on the XQuery and XPath Data Model (XDM) which uses a tree-structured model of the information content of an XML document, containing seven kinds of nodes: document nodes, elements, attributes, text nodes, comments, processing instructions, and namespaces.

XDM also models all values as sequences (a singleton value is considered to be a sequence of length one). The items in a sequence can either be XML nodes or atomic values. Atomic values may be integers, strings, booleans, and so on: the full list of types is based on the primitive types defined in XML Schema.

Features for updating XML documents or databases, and full text search capability, are not part of the core language, but are defined in add-on extension standards: XQuery Update Facility 1.0 supports update feature and XQuery and XPath Full Text 1.0 supports full text search in XML documents.

XQuery 3.0 adds support for full functional programming, in that functions are values that can be manipulated (stored in variables, passed to higher-order functions, and dynamically called).

Examples

The sample XQuery code below lists the unique speakers in each act of Shakespeare's play Hamlet, encoded in hamlet.xml

<html><body>{for$actindoc("hamlet.xml")//ACTlet$speakers:=distinct-values($act//SPEAKER)return<div><h1>{string($act/TITLE)}</h1><ul>{for$speakerin$speakersreturn<li>{$speaker}</li>}</ul></div>}</body></html>

All XQuery constructs for performing computations are expressions. There are no statements, even though some of the keywords appear to suggest statement-like behaviors. To execute a function, the expression within the body is evaluated and its value is returned. Thus to write a function to double an input value, one simply writes:

declarefunctionlocal:doubler($x){$x*2}

To write a full query saying 'Hello World', one writes the expression:

"Hello World"

This style is common in functional programming languages.

Applications

Below are a few examples of how XQuery can be used:

  1. Extracting information from a database for use in a web service.
  2. Generating summary reports on data stored in an XML database.
  3. Searching textual documents on the Web for relevant information and compiling the results.
  4. Selecting and transforming XML data to XHTML to be published on the Web.
  5. Pulling data from databases to be used for the application integration.
  6. Splitting up an XML document that represents multiple transactions into multiple XML documents.

XQuery and XSLT compared

Scope

Although XQuery was initially conceived as a query language for large collections of XML documents, it is also capable of transforming individual documents. As such, its capabilities overlap with XSLT, which was designed expressly to allow input XML documents to be transformed into HTML or other formats.

The XSLT 2.0 and XQuery standards were developed by separate working groups within W3C, working together to ensure a common approach where appropriate. They share the same data model (XDM), type system, and function library, and both include XPath 2.0 as a sublanguage.

Origin

The two languages, however, are rooted in different traditions and serve the needs of different communities. XSLT was primarily conceived as a stylesheet language whose primary goal was to render XML for the human reader on screen, on the web (as web template language), or on paper. XQuery was primarily conceived as a database query language in the tradition of SQL.

Because the two languages originate in different communities, XSLT is stronger[ according to whom? ] in its handling of narrative documents with more flexible structure, while XQuery is stronger in its data handling (for example, when performing relational joins).

Versions

XSLT 1.0 appeared as a Recommendation in 1999, whereas XQuery 1.0 only became a Recommendation in early 2007; as a result, XSLT is still much more widely used. Both languages have similar expressive power, though XSLT 2.0 has many features that are missing from XQuery 1.0, such as grouping, number and date formatting, and greater control over XML namespaces. [10] [11] [12] Many of these features were planned for XQuery 3.0. [13]

Any comparison must take into account the version of XSLT. XSLT 1.0 and XSLT 2.0 are very different languages. XSLT 2.0, in particular, has been heavily influenced by XQuery in its move to strong typing and schema-awareness.

Strengths and weaknesses

Usability studies have shown that XQuery is easier to learn than XSLT, especially for users with previous experience of database languages such as SQL. [14] This can be attributed to the fact that XQuery is a smaller language with fewer concepts to learn, and to the fact that programs are more concise. It is also true that XQuery is more orthogonal, in that any expression can be used in any syntactic context. By contrast, XSLT is a two-language system in which XPath expressions can be nested in XSLT instructions but not vice versa.

XSLT is currently stronger than XQuery for applications that involve making small changes to a document (for example, deleting all the NOTE elements). Such applications are generally handled in XSLT by use of a coding pattern that involves an identity template that copies all nodes unchanged, modified by specific templates that modify selected nodes. XQuery has no equivalent to this coding pattern, though in future versions it will be possible to tackle such problems using the update facilities in the language that are under development. [15]

XQuery 1.0 lacked any kind of mechanism for dynamic binding or polymorphism; this has been remedied with the introduction of functions as first-class values in XQuery 3.0. The absence of this capability starts to become noticeable when writing large applications, or when writing code that is designed to be reusable in different environments.[ citation needed ] XSLT offers two complementary mechanisms in this area: the dynamic matching of template rules, and the ability to override rules using xsl:import, that make it possible to write applications with multiple customization layers.

The absence of these facilities from XQuery 1.0 was a deliberate design decision: it has the consequence that XQuery is very amenable to static analysis, which is essential to achieve the level of optimization needed in database query languages. This also makes it easier to detect errors in XQuery code at compile time.

The fact that XSLT 2.0 uses XML syntax makes it rather verbose in comparison to XQuery 1.0. However, many large applications take advantage of this capability by using XSLT to read, write, or modify stylesheets dynamically as part of a processing pipeline. The use of XML syntax also enables the use of XML-based tools for managing XSLT code. By contrast, XQuery syntax is more suitable for embedding in traditional programming languages such as Java (see XQuery API for Java) or C#. If necessary, XQuery code can also be expressed in an XML syntax called XQueryX. The XQueryX representation of XQuery code is rather verbose and not convenient for humans, but can easily be processed with XML tools, for example transformed with XSLT stylesheets. [16] [17]

Extensions and future work

W3C extensions

Two major extensions to the XQuery were developed by the W3C:

Both reached Recommendation status as extensions to XQuery 1.0, but work on taking them forward to work with XQuery 3.0 was abandoned for lack of resources.

Work on XQuery 3.0 was published as a Recommendation on 8 April 2014, [19] and XQuery 3.1 is a Recommendation as at February 2017.

A scripting (procedural) extension for XQuery was designed, but never completed. [20] [21]

The EXPath Community Group [22] develops extensions to XQuery and other related standards (XPath, XSLT, XProc, and XForms). The following extensions are currently available:

Third-party extensions

JSONiq is an extension of XQuery that adds support to extract and transform data from JSON documents. JSONiq is a superset of XQuery 3.0. It is published under the Creative Commons Attribution-ShareAlike 3.0 license.

The EXQuery [27] project develops standards around creating portable XQuery applications. The following standards are currently available:

Further reading

Implementations

Overview of popular XQuery implementations
NameLicenseLanguageXQuery 3.1XQuery 3.0XQuery 1.0XQuery Update 1.0XQuery Full Text 1.0
BaseX BSD license JavaYesYesYesYesYes
eXist LGPL JavaPartialPartialYesNoNo
MarkLogic ProprietaryC++NoPartialYesNoNo
Saxon HE Mozilla Public License JavaPartialPartialYesYesNo
Saxon EEProprietaryJavaYesYesYesYesNo
Zorba Apache License C++NoYesYesYesYes

Related Research Articles

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

Saxon is an XSLT and XQuery processor created by Michael Kay and now developed and maintained by his company, Saxonica. There are open-source and also closed-source commercial versions. Versions exist for Java, JavaScript and .NET.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

eXist-db is an open source software project for NoSQL databases built on XML technology. It is classified as both a NoSQL document-oriented database system and a native XML database. Unlike most relational database management systems (RDBMS) and NoSQL databases, eXist-db provides XQuery and XSLT as its query and application programming languages.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

The programming language XQuery defines FLWOR as an expression that supports iteration and binding of variables to intermediate results. FLWOR is an acronym: FOR, LET, WHERE, ORDER BY, RETURN. FLWOR is loosely analogous to SQL's SELECT-FROM-WHERE and can be used to provide join-like functionality to XML documents.

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

<span class="mw-page-title-main">XQuery and XPath Data Model</span>

The XQuery and XPath Data Model (XDM) is the data model shared by the XPath 2.0, XSLT 2.0, XQuery, and XForms programming languages. It is defined in a W3C recommendation. Originally, it was based on the XPath 1.0 data model which in turn is based on the XML Information Set.

An RDF query language is a computer language, specifically a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

XQuery Update Facility is an extension to the XML Query language, XQuery. It provides expressions that can be used to make changes to instances of the XQuery 1.0 and XPath 2.0 Data Model.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

EMML, or Enterprise Mashup Markup Language, is an XML markup language for creating enterprise mashups, which are software applications that consume and mash data from variety of sources, often performing logical or mathematical operations as well as presenting data.

Sedna is an open-source database management system that provides native storage for XML data. The distinctive design decisions employed in Sedna are (i) schema-based clustering storage strategy for XML data and (ii) memory management based on layered address space.

A Processing Instruction (PI) is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application.

<span class="mw-page-title-main">XQuery API for Java</span> Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

<span class="mw-page-title-main">XML transformation language</span> Type of programming language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

Zorba is an open source query processor written in C++, implementing

Stylus Studio is an integrated development environment (IDE) for the Extensible Markup Language (XML). It consists of a variety of tools and visual designers to edit and transform XML documents and legacy data such as electronic data interchange (EDI), comma-separated values (CSV) and relational data.

JSONiq is a query and functional programming language that is designed to declaratively query and transform collections of hierarchical and heterogeneous data in format of JSON, XML, as well as unstructured, textual data.

References

  1. "XQuery 3.1 Recommendation". 2017-03-21.
  2. "XQuery 3.1: An XML Query Language". 2017-03-21.
  3. "XQuery and Static Typing".
  4. "XML and Semantic Web W3C Standards Timeline" (PDF). 2012-02-04.
  5. "XQuery 3.0 Recommendation". 2014-04-08.
  6. "XQuery 3.1 Recommendation". 2017-03-21.
  7. W3C (2003-10-25). "cited by J.Robie".
  8. Kilpeläinen, Pekka (2012). "Using XQuery for problem solving" (PDF). Software: Practice and Experience. 42 (12): 1433–1465. doi:10.1002/spe.1140. S2CID   15561027.
  9. "Data retrieval with XQuery". Retrieved on 18 January 2016.
  10. Kay, Michael (May 2005). "Comparing XSLT and XQuery".
  11. Eisenberg, J. David (2005-03-09). "Comparing XSLT and XQuery".
  12. Smith, Michael (2001-02-23). "XQuery, XSLT "overlap" debated".
  13. "XQuery 3.0 requirements".
  14. Usability of XML Query Languages. Joris Graaumans. SIKS Dissertation Series No 2005-16, ISBN   90-393-4065-X
  15. "XQuery Update Facility".
  16. "XML Syntax for XQuery (XQueryX)".
  17. Michael Kay. "Saxon diaries: How not to fold constants".
  18. XQuery and XPath Full Text 1.0
  19. XML Query (XQuery) 3.0
  20. XQuery Scripting Extension 1.0 Requirements
  21. XQuery 1.0 Scripting Extension
  22. EXPath Community Group
  23. Packaging System
  24. File Module
  25. Binary Module
  26. Web Applications
  27. "Standard for portable XQuery applications" . Retrieved 12 December 2013.
  28. "RESTXQ 1.0: RESTful Annotations for XQuery".

See also