Jq (programming language)

Last updated
jq
Jq logo.svg
The official jq logo
Paradigms Purely functional programming, JSON-oriented processing, tacit programming
Designed by Stephen Dolan
First appearedAugust 21, 2012;12 years ago (2012-08-21)
Stable release
1.7.1 [1]   OOjs UI icon edit-ltr-progressive.svg / December 13, 2023;12 months ago (December 13, 2023)
Implementation languagejq: C
gojq: Go
jaq: Rust
jqjq: jq
Platform Cross-platform [note 1]
OS Cross-platform [note 2]
License MIT [note 3]
Website jqlang.github.io/jq

jq is a very high-level lexically scoped functional programming language in which every JSON value is a constant. jq supports backtracking and managing indefinitely long streams of JSON data. It is related to the Icon and Haskell programming languages. The language supports a namespace-based module system and has some support for closures. In particular, functions and functional expressions can be used as parameters of other functions.

Contents

The original implementation of jq was in Haskell [3] before being immediately ported to C.

History

jq was created by Stephen Dolan, and released in October 2012. [4] [5] It was described as being "like sed for JSON data". [6] Support for regular expressions was added in jq version 1.5.

A "wrapper" program for jq named yq adds support for YAML, XML and TOML. It was first released in 2017. [7]

The Go implementation, gojq, was initially released in 2019. [8] gojq notably extends jq to include support for YAML.

The Rust implementation, jaq, has as its project goals a faster and more correct implementation of jq, while preserving compatibility with jq in most cases. Explicitly excluded from the project goals as of March 2024 are certain advanced features of jq such as modules, SQL-style operators, and a streaming parser for very large JSON documents. [9]

The jq implementation, jqjq, was initially released in 2022. jqjq notably can run itself, has a REPL and supports eval.

Usage

Command-line usage

jq is typically used at the command line and can be used with other command-line utilities, such as curl. Here is an example showing how the output of a curl command can be piped to a jq filter to determine the category names associated with this Wikipedia page:

$ curl-s'https://en.wikipedia.org/w/api.php?action=parse&page=jq_(programming_language)&format=json'|jq'.parse.categories[]."*"'

The output produced by this pipeline consists of a stream of JSON strings, the first few of which are:

"Articles_with_short_description""Short_description_matches_Wikidata""Dynamically_typed_programming_languages""Functional_languages""Programming_languages""Programming_languages_created_in_2012""Query_languages""2012_software"

The curl command above uses the MediaWiki API for this page to produce a JSON response. The pipe | allows the output of curl to be accessed by jq, a standard Unix shell mechanism. [10]

The jq filter shown is an abbreviation for the jq pipeline:

.["parse"] | .["categories"] | .[] | .["*"] 

This corresponds to the nested JSON structure produced by the call to curl. Notice that the jq pipeline is constructed in the same manner using the | character as the Unix-style pipeline.

Embedded usage

Both the C and the Go implementations provide libraries so that jq functionality can be embedded in other applications and programming environments.

For example, gojq has been integrated with SQLite so that a jq function is available in SQL statements. [11] This function is marked as "deterministic" and can therefore be used in "CREATE INDEX" commands. [12]

Modes of operation

jq by default acts as a "stream editor" for JSON inputs, much like the sed utility can be thought of as a "stream editor" for lines of text. However jq has several other modes of operation:

  1. it can treat its input from one or more sources as lines of text;
  2. it can gather a stream of inputs from a specified source into a JSON array;
  3. it can parse its JSON inputs using a so-called "streaming parser" that produces a stream of [path, value] arrays for all "leaf" paths.

The "streaming parser" is particularly useful when one of more of the JSON inputs is too large to fit into memory, since its memory requirements are typically quite small. For example, for an arbitrarily large array of JSON objects, the peak memory requirement is not much more than required to handle the largest top-level object.

These modes of operation can, within certain limitations, be combined.

Syntax and semantics

Types

Every JSON value is itself a value in jq, which accordingly has the types shown in the table below. [13] The gojq and jaq implementations distinguish between integers and non-integer numbers. The gojq implementation supports unbounded-precision integer arithmetic, as did the original implementation of jq in Haskell.

Summary of jq's supported types
TypeExamples
"number"
  • 3
  • 3.2
  • 1e6
  • nan
  • infinite
"string"
  • "Hello"
  • "😐"
"boolean"
  • true
  • false
"array"
  • [1,"2",{"mixed":"type"},[3,4]]
"object"
  • {"one":1,"two":"2","three":[3]}
"null"
  • null

null is a value, just like any other JSON scalar; it is not a pointer or a "null-pointer". nan (corresponding to NaN) and infinite (see IEEE 754) are the only two jq scalars that are not also JSON values.

Forms

There are special syntactic forms for function creation, conditionals, stream reduction, and the module system.

Filters

Here is an example which shows how to define a named, parameterized filter for formatting an integer in any base from 2 to 36 inclusive. The implementation illustrates tacit (or point-free) programming:

# Use gojq for infinite precision integer arithmetic def tobase($b):     def digit: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[.:.+1];     def mod: . % $b;     def div: ((. - mod) / $b);     def digits: recurse( select(. >= $b) | div) | mod ;      select(2 <= $b and $b <= 36)     | [digits | digit] | reverse | add; 

The next example demonstrates the use of generators in the classic "SEND MORE MONEY" verbal arithmetic game:

def send_more_money:     def choose(m;n;used): ([range(m;n+1)] - used)[];     def num(a;b;c;d): 1000*a + 100*b + 10*c + d;     def num(a;b;c;d;e): 10*num(a;b;c;d) + e;     first(       1 as $m       | 0 as $o       | choose(8;9;[]) as $s       | choose(2;9;[$s]) as $e       | choose(2;9;[$s,$e]) as $n       | choose(2;9;[$s,$e,$n]) as $d       | choose(2;9;[$s,$e,$n,$d]) as $r       | choose(2;9;[$s,$e,$n,$d,$r]) as $y       | select(num($s;$e;$n;$d) + num($m;$o;$r;$e) ==                num($m;$o;$n;$e;$y))       | [$s,$e,$n,$d,$m,$o,$r,$e,$m,$o,$n,$e,$y] ); 

Parsing expression grammars

There is a very close relationship between jq and the parsing expression grammar (PEG) formalism. [14] The relationship stems from the equivalence of the seven basic PEG operations and the jq constructs shown in the following table.

Correspondence between PEG operations and jq equivalents
PEG operation namePEG notationjq operation or def
Sequencee1e2e1 | e2
Ordered choicee1/e2e1 // e2
Zero-or-moree*def star(E): (E | star(E)) // . ;
One-or-moree+def plus(E): E | (plus(E) // . );
Optionale?def optional(E): E // .;
And-predicate&edef amp(E): . as $in | E | $in;
Not-predicate!edef neg(E): select( [E] == [] );

Ports and variants

gojq is a "pure Go" implementation. There is also a Rust implementation of a dialect of jq named jaq [9] for which a denotational semantics has been specified. [15]

Notes

  1. Neither the C nor the Go implementations of jq has any runtime dependencies. [2]
  2. Including Windows, Linux, and macOS. The Go implementation can be compiled on any platform on which Go is supported. [2]
  3. The C implementation of jq uses a decimal floating-point library known as decNumber, which is licensed under the ICU license; and the Oniguruma regex library, which has a BSD license. [2]

Related Research Articles

Rebol is a cross-platform data exchange language and a multi-paradigm dynamic programming language designed by Carl Sassenrath for network communications and distributed computing. It introduces the concept of dialecting: small, optimized, domain-specific languages for code and data, which is also the most notable property of the language according to its designer Carl Sassenrath:

Although it can be used for programming, writing functions, and performing processes, its greatest strength is the ability to easily create domain-specific languages or dialects

sed Standard UNIX utility for editing streams of data

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

<span class="mw-page-title-main">Apache Groovy</span> Programming language

Apache Groovy is a Java-syntax-compatible object-oriented programming language for the Java platform. It is both a static and dynamic language with features similar to those of Python, Ruby, and Smalltalk. It can be used as both a programming language and a scripting language for the Java Platform, is compiled to Java virtual machine (JVM) bytecode, and interoperates seamlessly with other Java code and libraries. Groovy uses a curly-bracket syntax similar to Java's. Groovy supports closures, multiline strings, and expressions embedded in strings. Much of Groovy's power lies in its AST transformations, triggered through annotations.

<span class="mw-page-title-main">DIGITAL Command Language</span> Command language adopted by several operating systems (OSs)

DIGITAL Command Language (DCL) is the standard command language adopted by many of the operating systems created by Digital Equipment Corporation. DCL had its roots in IAS, TOPS-20, and RT-11 and was implemented as a standard across most of Digital's operating systems, notably RSX-11 and RSTS/E, but took its most powerful form in VAX/VMS. DCL continues to be developed by VSI as part of OpenVMS.

In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of name–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

Tacit programming, also called point-free style, is a programming paradigm in which function definitions do not identify the arguments on which they operate. Instead the definitions merely compose other functions, among which are combinators that manipulate the arguments. Tacit programming is of theoretical interest, because the strict use of composition results in programs that are well adapted for equational reasoning. It is also the natural style of certain programming languages, including APL and its derivatives, and concatenative languages such as Forth. The lack of argument naming gives point-free style a reputation of being unnecessarily obscure, hence the epithet "pointless style".

A negative base may be used to construct a non-standard positional numeral system. Like other place-value systems, each position holds multiples of the appropriate power of the system's base; but that base is negative—that is to say, the base b is equal to −r for some natural number r.

Nemerle is a general-purpose, high-level, statically typed programming language designed for platforms using the Common Language Infrastructure (.NET/Mono). It offers functional, object-oriented, aspect-oriented, reflective and imperative features. It has a simple C#-like syntax and a powerful metaprogramming system.

<span class="mw-page-title-main">Amber Smalltalk</span>

Amber Smalltalk, formerly named Jtalk, is an implementation of the programming language Smalltalk-80, that runs on the JavaScript runtime of a web browser. It is designed to enable client-side development using Smalltalk. The programming environment in Amber is named Helios.

Elixir is a functional, concurrent, high-level general-purpose programming language that runs on the BEAM virtual machine, which is also used to implement the Erlang programming language. Elixir builds on top of Erlang and shares the same abstractions for building distributed, fault-tolerant applications. Elixir also provides tooling and an extensible design. The latter is supported by compile-time metaprogramming with macros and polymorphism via protocols.

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 at the Viewpoints Research Institute. The language is based on parsing expression grammars (PEGs), rather than context-free grammars, with the intent to provide "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".

JSON streaming comprises communications protocols to delimit JSON objects built upon lower-level stream-oriented protocols, that ensures individual JSON objects are recognized, when the server and clients use the same one. This is necessary as JSON is a non-concatenative protocol.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level system programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

Ballerina is an open source general-purpose programming language designed by WSO2 for cloud-era application programmers.

JData is a light-weight data annotation and exchange open-standard designed to represent general-purpose and scientific data structures using human-readable (text-based) JSON and (binary) UBJSON formats. JData specification specifically aims at simplifying exchange of hierarchical and complex data between programming languages, such as MATLAB, Python, JavaScript etc. It defines a comprehensive list of JSON-compatible "name":value constructs to store a wide range of data structures, including scalars, N-dimensional arrays, sparse/complex-valued arrays, maps, tables, hashes, linked lists, trees and graphs, and support optional data grouping and metadata for each data element. The generated data files are compatible with JSON/UBJSON specifications and can be readily processed by most existing parsers. JData-defined annotation keywords also permit storage of strongly-typed binary data streams in JSON, data compression, linking and referencing.

References

Bibliography

  • Janssens, Jeroen (2021). Data Science at the Command Line. O'Reilly Media. ISBN   9781492087885.
  • Janssens, Jeroen (2014). Data Science at the Command Line: Facing the Future with Time-Tested Tools. O'Reilly Media. ISBN   9781491947807.
  • Marrs, Tom (2017). JSON at Work: Practical Data Integration for the Web. O'Reilly Media. ISBN   9781491982419.

Others

  1. "Release jq 1.7.1".
  2. 1 2 3 "Download jq". jq. Retrieved January 6, 2023.
  3. "Initial · jqlang/Jq@eca89ac". GitHub .
  4. Janssens 2014.
  5. "jq". jq. Retrieved January 6, 2023.
  6. "like sed". Archived from the original on 2013-04-14.
  7. "Release v2.0.0 · kislyuk/yq". GitHub .
  8. "Release v0.0.1 · itchyny/gojq". GitHub .
  9. 1 2 "01mf02/jaq: A jq clone focussed on correctness, speed, and simplicity". GitHub . Retrieved March 6, 2024.
  10. "Tutorial". jq. Retrieved January 6, 2023.
  11. "sqlite_jq". GitHub .
  12. "FAQ". GitHub .
  13. "Manual". jq. Retrieved January 6, 2023.
  14. "PEG". PEG.
  15. Färber, Michael (2023). "Denotational Semantics and a fast interpreter for jq". arXiv: 2302.10576 [cs.LO].