XPath 3

Last updated

XPath 3 is the latest version of the XML Path Language, a query language for selecting nodes in XML documents. It supersedes XPath 1.0 and XPath 2.0.

Contents

XPath 3.0 became a W3C Recommendation on 8 April 2014, while XPath 3.1 became a W3C Recommendation on 21 March 2017.

New features in XPath 3.0

Compared to XPath 2.0, XPath 3.0 adds the following new features:

Inline function expressions
Anonymous functions can be created in an expression context. For example, the expression function($aasxs:double,$basxs:double)asxs:double{$a*$b} creates a function that returns the product of its two arguments. The expression collection()/(let$a:=.returnfunction(){$a}) creates a sequence of functions, each one returning a different node from a collection.
Dynamic function calls
Function values may be called without being referenced by name. For example, $f[2]("Hi there") fetches the second item from sequence $f, and invokes it as a function, passing the string "Hi there" as argument.
Union types
Union types, as defined in XML Schema, may be used in type conversions and function type signatures.
Namespace literals
XML namespaces may be referenced using braced URI literals. For example, the qualified name math:pi may be expanded to Q{http://www.w3.org/2005/xpath-functions/math}pi, embedding the namespace URI inside the prefix.
String concatenation operator
The new || operator may be used for string concatenation: $a || $b is equivalent to fn:concat($a,$b).
Mapping operator
The new ! operator performs simple mapping: E1 ! E2 evaluates E2 for each item in the sequence E1, and concatenates the resulting items. This is comparable to the path operator /, but the ! operator does not perform duplicate elimination nor document ordering of the results.

New features in XPath 3.1

XPath 3.1 mainly adds support for array and map (associative array) data types. These types and their associated functionality are intended to ease working with JSON data.

Another innovation is the arrow operator => for function chaining. For example, the XPath 2.0 expression

contains(upper-case(substring-before($in,' ')),'X')

can now be written

$in=>substring-before(' ')=>upper-case()=>contains('X')

Related Research Articles

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, concepts, or information resources such as web pages and books. Some URIs provide a means of locating and retrieving information resources on a network ; these are Uniform Resource Locators (URLs). A URL provides the location of the resource. A URI identifies the resource by name at the specified location or URL. Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it, these are Uniform Resource Names (URNs). The web technologies that use URIs are not limited to web browsers. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

A string literal or anonymous string is a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

A list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists. It follows the form of the mathematical set-builder notation as distinct from the use of map and filter functions.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

String functions are used in computer programming languages to manipulate a string or query information about a string.

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

<span class="mw-page-title-main">RDFLib</span> Python library to serialize, parse and process RDF data

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. This library contains parsers/serializers for almost all of the known RDF serializations, such as RDF/XML, Turtle, N-Triples, & JSON-LD, many of which are now supported in their updated form. The library also contains both in-memory and persistent Graph back-ends for storing RDF information and numerous convenience functions for declaring graph namespaces, lodging SPARQL queries and so on. It is in continuous development with the most recent stable release, rdflib 6.1.1 having been released on 20 December 2021. It was originally created by Daniel Krech with the first release in November, 2002.

N-Triples is a format for storing and transmitting data. It is a line-based, plain text serialisation format for RDF graphs, and a subset of the Turtle format. N-Triples should not be confused with Notation3 which is a superset of Turtle. N-Triples was primarily developed by Dave Beckett at the University of Bristol and Art Barstow at the World Wide Web Consortium (W3C).

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

<span class="mw-page-title-main">XQuery API for Java</span> Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

Zorba is an open source query processor written in C++, implementing

JSONiq is a query and functional programming language that is designed to declaratively query and transform collections of hierarchical and heterogeneous data in format of JSON, XML, as well as unstructured, textual data.

References