Ion (serialization format)

Ion
Filename extension	.ion
Internet media type	application/ion
Developed by	Amazon
Type of format	Data interchange
Website	amzn.github.io/ion-docs/

Last updated December 24, 2024

Ion is a data serialization language developed by Amazon. It may be represented by either a human-readable text form or a compact binary form. The text form is a superset of JSON; thus, any valid JSON document is also a valid Ion document.

Data types

As a superset of JSON, Ion includes the following data types

null: An empty value
bool: Boolean values
string: Unicode text literals
list: Ordered heterogeneous collection of Ion values
struct: Unordered collection of key/value pairs

The nebulous JSON 'number' type is strictly defined in Ion to be one of

int: Signed integers of arbitrary size
float: 64-bit IEEE binary-encoded floating point numbers
decimal: Decimal-encoded real numbers of arbitrary precision

Ion adds these types:

timestamp: Date/time/time zone moments of arbitrary precision
symbol: Unicode symbolic atoms (aka identifiers)
blob: Binary data of user-defined encoding
clob: Text data of user-defined encoding
sexp: Ordered collections of values with application-defined semantics

Each Ion type supports a null variant, indicating a lack of value while maintaining a strict type (e.g., null.int, null.struct).

The Ion format permits annotations to any value in the form of symbols. Such annotations may be used as metadata for otherwise opaque data (such as a blob).

Implementations

Examples

Sample document

// comments are allowed in Ion files using the double forward slash {key:"value",// key here is a symbol, it can also be a string as in JSONnums:1_000_000,// equivalent to 1000000, use of underscores with numbers is more readable'A float value':31415e-4,// key is a value that contains spaces "An int value":.int,annotated:age::35,// age here is the annotation to number 35lists:'hw grades'::[80,85,90],// any symbol can be used as an annotation many_annot:I::have::many::annotations::true,// annotations are not nested, but rather, a list of annotationssexp:(this(isa[valid]"Ion")last::value42)// Ion S-expressions, _value:{{OiBTIKUgTyAASb8=}},_value:{{"a b"}}}

Uses

Amazon's Quantum Ledger Database (QLDB) stores data in Ion documents.^[1]
PartiQL, an open source SQL-based query language also by Amazon, is built upon Ion. PartiQL supported queries are used by QLDB, S3Select.^[2]

Tooling and extensions

Ion Path Extractor API aims to combine the convenience of a DOM API with the speed of a streaming API.
IDE support
- Eclipse
- IntelliJ
Jackson data format module for Ion
Apache Hive SerDe for Ion
Ion Schema
- Specification
- Implementations
Ion Hash defines an algorithm for constructing a hash for any Ion value.
- Specification
- Implementations

Related Research Articles

<span class="mw-page-title-main">Serialization</span> Conversion process for computer data

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

A Berkeley (BSD) socket is an application programming interface (API) for Internet domain sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

YAML is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which type is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of name–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

JSON-RPC is a remote procedure call (RPC) protocol encoded in JSON. It is similar to the XML-RPC protocol, defining only a few data types and commands. JSON-RPC allows for notifications and for multiple calls to be sent to the server which may be answered asynchronously.

In C programming, the functions getaddrinfo and getnameinfo convert domain names, hostnames, and IP addresses between human-readable text representations and structured binary formats for the operating system's networking API. Both functions are contained in the POSIX standard application programming interface (API).

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

C++11 is a version of a joint technical standard, ISO/IEC 14882, by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), for the C++ programming language. C++11 replaced the prior version of the C++ standard, named C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

A Canonical S-expression is a binary encoding form of a subset of general S-expression. It was designed for use in SPKI to retain the power of S-expressions and ensure canonical form for applications such as digital signatures while achieving the compactness of a binary form and maximizing the speed of parsing.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages: one for human editing and another which is more machine-readable based on JSON.

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages, some official libraries and others community created, such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

JSONiq is a query and functional programming language that is designed to declaratively query and transform collections of hierarchical and heterogeneous data in format of JSON, XML, as well as unstructured, textual data.

GraphQL is a data query and manipulation language for APIs that allows a client to specify what data it needs. A GraphQL server can fetch data from separate sources for a single client query and present the results in a unified graph. It is not tied to any specific database or storage engine.

JData is a light-weight data annotation and exchange open-standard designed to represent general-purpose and scientific data structures using human-readable (text-based) JSON and (binary) UBJSON formats. JData specification specifically aims at simplifying exchange of hierarchical and complex data between programming languages, such as MATLAB, Python, JavaScript etc. It defines a comprehensive list of JSON-compatible "name":value constructs to store a wide range of data structures, including scalars, N-dimensional arrays, sparse/complex-valued arrays, maps, tables, hashes, linked lists, trees and graphs, and support optional data grouping and metadata for each data element. The generated data files are compatible with JSON/UBJSON specifications and can be readily processed by most existing parsers. JData-defined annotation keywords also permit storage of strongly-typed binary data streams in JSON, data compression, linking and referencing.

References

↑ "We are the Amazon Managed Blockchain and Amazon QLDB Teams – Ask the AWS Experts – November 29 @ 3PM PST / 6PM EST". 28 November 2018.
↑ "Announcing PartiQL: One query language for all your data". August 2019.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[reddit.com-1] "We are the Amazon Managed Blockchain and Amazon QLDB Teams – Ask the AWS Experts – November 29 @ 3PM PST / 6PM EST". 28 November 2018.

[aws_blog-2] "Announcing PartiQL: One query language for all your data". August 2019.

[1]

[2]

v t e Data exchange formats
Human readable	Atom CSV EDIFACT JSON Web Encryption Web Token Web Signature Property list RDF Rebol TOML XML YAML
Binary	AMF Ascii85 ASN.1 SMI Avro Base32 Base64 Bencode BSON UBJSON Cap'n Proto CBOR FlatBuffers MessagePack Property list Protocol Buffers Thrift Cyphal DSDL XDR uuencode yEnc
Comparison of data-serialization formats