JSONiq

Last updated
JSONiq
Paradigm declarative, functional, modular
Typing discipline dynamic, strong
OS Cross-platform
Filename extensions .jq, .jqy
Website www.jsoniq.org
Influenced by
XQuery, SQL

JSONiq is a query and functional programming language that is designed to declaratively query and transform collections of hierarchical and heterogeneous data in format of JSON, XML, as well as unstructured, textual data.

Contents

JSONiq is an open specification published under the Creative Commons Attribution-ShareAlike 3.0 license. It is based on the XQuery language, with which it shares the same core expressions and operations on atomic types. JSONiq comes in two syntactical flavors, which both support JSON and XML natively.

  1. The JSONiq syntax (a superset of JSON) extended with XML support through a compatible subset of XQuery.
  2. The XQuery syntax (native XML support) extended with JSON support through a compatible subset (the JSONiq extension to XQuery) of the above JSONiq syntax.

Features

JSONiq primarily provides means to extract and transform data from JSON documents or any data source that can be viewed as JSON (e.g. relational databases or web services).

The major expression for performing such operations is the SQL-like “FLWOR expression” that comes from XQuery. A FLWOR expression is constructed from the five clauses after which it is named: FOR, LET, WHERE, ORDER BY, RETURN. However, it also supports clauses for doing grouping and windowing.

The language also provides syntax for constructing new JSON documents where either the field names and values are known in advance or can be computed dynamically. The JSONiq language (not the extension to XQuery) is a superset of JSON. That is, each JSON document is a valid JSONiq program.

Additionally, the language also supports a navigational syntax for extracting field names and values out of JSON objects as well as values out of JSON arrays. Navigation is resilient in the absence of values, or if values are heterogeneous, in that it silently ignores unforeseen values without raising errors.

All constructs are defined as expressions within the language and can be arbitrarily nested.

JSONiq does not include features for updating JSON or XML documents, it does not have full text search capabilities, and has no statements. All of these features are under active development for a subsequent version of the language.

JSONiq is a programming language that can express arbitrary JSON to JSON or XML to XML transformations. It also allows for transformations between JSON and XML. All such transformations have the following features:

  1. Logical/physical data independence
  2. Declarative
  3. High level
  4. Side-effect free
  5. Strongly typed

Data model

The language is based on the JSONiq Data Model (JDM) which is an extension of the XQuery and XPath Data Model (XDM). The JDM uses a tree-structured model of the information content of a JSON or XML document. It contains JSON objects, JSON arrays, all kinds of XML nodes, as well as atomic values such as integers, strings, or boolean all being defined in XML Schema.

JDM forms the basis for a set-oriented language, in that instances of the data model are sequences (a singleton value is considered to be a sequence of length one). The items in a sequence can be JSON objects, JSON arrays, XML nodes, or atomic values.

Examples

The sample JSONiq code below computes the area code and the number of all people older than 20 from a collection of JSON person objects (see the JSON article for an example object).

for$pincollection("persons")where$p.agegt20let$home:=$p.phoneNumber[][$$.typeeq"home"].numbergroup by$area:=substring-before($home," ")return{"area code":$area,"count":count($p)}

All JSONiq constructs are expressions and can also be contained in the body of a function.

declarefunctionlocal:adults(){for$pincollection("persons")where$p.agegt20return$p};

The next query transforms parts of each person object into an XML element using the XQuery syntax (JSONiq extension to XQuery).

for$pincollection("persons")return<person><firstName>{$p("firstName")}</firstName><lastName>{$p("lastName")}</lastName><age>{$p("age")}</age></person>

Applications

Below are a few examples of how and where JSONiq can be used:

  1. Extracting information out of a database to use in a web service.
  2. Generating summary reports on data stored in a JSON document store.
  3. Selecting and transforming JSON data to XHTML to be published on the Web.
  4. Correlating data from various sources and formats (e.g. JSON document store, XML database, relational database, and web service) and offer it in a web service.
  5. Transforming collections of JSON objects into a different schema.

Comparison of the two syntactic flavors

There are two syntaxes of JSONiq, which users can use on whether they are focusing on JSON or XML. Both syntaxes use the same data model and are very similar up to a few exceptions.

JSONiq syntax

The pure JSONiq syntax is a superset of JSON. It is not strictly speaking a superset of XQuery even though all its expressions and semantics are available. The following aspects of the JSONiq syntax are not XQuery conformant:

  1. No names containing dots.
  2. No . for the context item ($$ has to be used instead).
  3. No single-quoted literals.
  4. JSON, backslash-based escaping in string literals.
  5. No axis step allowed at the beginning of a relative path expression.

XQuery syntax with JSONiq extension

The JSONiq extension to XQuery is a superset of XQuery but not a superset of JSON. It is fully conformant and backwards compatible with XQuery 3.0 candidate recommendation. The following aspects of JSONiq are not supported in the XQuery syntax.

  1. No dot-based object lookup ($object(“key”) instead).
  2. No $$ for the context item.
  3. XML, ampersand-based escaping of string literals.
  4. Object keys must be quoted
  5. No true/false/null literals
  6. Built-in atomic types must be prefixed with xs:.
  7. Non-atomic types must be followed by parentheses.
  8. The empty-sequence() must be written as such.
  9. No array lookup and no [] array unboxing.

Further reading

Implementations

Related Research Articles

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

XSLT is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. XSLT 1.0 is widely supported in modern web browsers.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

YAML Human-readable data serialization format

YAML is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax which intentionally differs from SGML. It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...] for lists and {...} for maps thus JSON files are valid YAML 1.2.

Query languages, data query languages or database query languages (DQLs) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL).

eXist-db is an open source software project for NoSQL databases built on XML technology. It is classified as both a NoSQL document-oriented database system and a native XML database. Unlike most relational database management systems (RDBMS) and NoSQL databases, eXist-db provides XQuery and XSLT as its query and application programming languages.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a common data format with a diverse range of functionality in data interchange including communication of web applications with servers.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

The programming language XQuery defines FLWOR as an expression that supports iteration and binding of variables to intermediate results. FLWOR is an acronym: FOR, LET, WHERE, ORDER BY, RETURN. FLWOR is loosely analogous to SQL's SELECT-FROM-WHERE and can be used to provide join-like functionality to XML documents.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

XPath is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

XQuery API for Java

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

BaseX

BaseX is a native and light-weight XML database management system and XQuery processor, developed as a community project on GitHub. It is specialized in storing, querying, and visualizing large XML documents and collections. BaseX is platform-independent and distributed under the BSD-3-Clause license.

XML transformation language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

Zorba is an open source query processor written in C++, implementing

Stylus Studio is an integrated development environment (IDE) for the Extensible Markup Language (XML). It consists of a variety of tools and visual designers to edit and transform XML documents and legacy data such as electronic data interchange (EDI), comma-separated values (CSV) and relational data.

References