UBJSON

Last updated
UBJSON
Original author(s) Riyad Kalla
Stable release
Draft 12
Written inVarious languages
Operating system Any
Platform Cross-platform
Type Data interchange
License Apache 2.0
Website ubjson.org

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

Contents

Rationale and Objectives

UBJSON is a proposed successor to BSON, BJSON and others. UBJSON has the following goals:

Data types and syntax

UBJSON data can be either a value or a container.

Value types

UBJSON uses a single binary tuple to represent all JSON value types: [1]

   type [length] [data]

Each element in the tuple is defined as:

type

The type is a 1-byte ASCII character used to indicate the type of the data following it. The ASCII characters were chosen to make manually walking and debugging data stored in the UBJSON format as easy as possible (e.g. making the data relatively readable in a hex editor). Types are available for the five JSON value types. There is also a no-op type used for stream keep-alive.

High-precision numbers are represented as an arbitrarily long, UTF-8 string-encoded numeric value.

length (optional)

The length is an integer number (e.g. uint8, or int64) encoding the size of the data payload in bytes. It is used for strings, high-precision numbers and optionally containers. They are omitted for other types.

Length is encoded following the same convention as integers, thus including its own type. For example, the string hello is encoded as S,U,0x05,h,e,l,l,o.

data (optional)

A sequence of bytes representing the actual binary data for this type of value. All numbers are in big-endian order.

Container types

Similarly to JSON, UBJSON defines two container types: array and object. [2]

Arrays are ordered sequences of elements, represented as a [ followed by zero or more elements of value and container type and a trailing ].

Objects are labeled sets of elements, represented as a { followed by zero or more key-value pairs and a trailing }. Each key is a string with the S character omitted, and each "value" can be any element of value or container type.

Alternatively, arrays and objects may indicate the number of elements they contain as # followed by an integer number before their first element, in which case the trailing ] or } is omitted. Additionally, if all elements have the same type, the types can be omitted and replaced by a single $ followed by the type, in which case the element count must follow immediately. For example, the array ["a","b","c"] may be represented as [,$,C,#,U,0x03,a,b,c.

Binary data

While there is no explicit binary type, binary data is stored in a strongly typed array of uint8 values. This ensures binary efficiency while maintaining compatibility with JSON, even though JSON has no direct support for binary data. [3] [4]

Representation

The MIME type 'application/ubjson' is recommended, as is the file extension '.ubj' when stored in a file-system. [4]

Software support

See also

Related Research Articles

Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans. A data type constrains the possible values that an expression, such as a variable or a function, might take. This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

YAML is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax which intentionally differs from SGML. It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...] for lists and {...} for maps thus JSON files are valid YAML 1.2.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

A Canonical S-expression is a binary encoding form of a subset of general S-expression. It was designed for use in SPKI to retain the power of S-expressions and ensure canonical form for applications such as digital signatures while achieving the compactness of a binary form and maximizing the speed of parsing.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

Data Format Description Language, published as an Open Grid Forum Proposed Recommendation in January 2011, is a modeling language for describing general text and binary data in a standard way. A DFDL model or schema allows any text or binary data to be read from its native format and to be presented as an instance of an information set.. The same DFDL schema also allows data to be taken from an instance of an information set and written out to its native format.

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Scala, Smalltalk, and Swift.

Binn is a computer data serialization format used mainly for application data transfer. It stores primitive data types and data structures in a binary form.

Ion is a data serialization language developed by Amazon. It may be represented by either a human-readable text form or a compact binary form. The text form is a superset of JSON; thus, any valid JSON document is also a valid Ion document.

Concise Binary Object Representation (CBOR) is a binary data serialization format loosely based on JSON authored by C. Bormann. Like JSON it allows the transmission of data objects that contain name–value pairs, but in a more concise manner. This increases processing and transfer speeds at the cost of human readability. It is defined in IETF RFC 8949.

<span class="mw-page-title-main">JSON→URL</span> Text-based data interchange format designed for use in a URL query string

JSON→URL is a language-independent data interchange format for the JSON data model suitable for use within a URL/URI query string. It is defined by an open specification, though not through a standards body.

JData is a light-weight data annotation and exchange open-standard designed to represent general-purpose and scientific data structures using human-readable (text-based) JSON and (binary) UBJSON formats. JData specification specifically aims at simplifying exchange of hierarchical and complex data between programming languages, such as MATLAB, Python, JavaScript etc. It defines a comprehensive list of JSON-compatible "name":value constructs to store a wide range of data structures, including scalars, N-dimensional arrays, sparse/complex-valued arrays, maps, tables, hashes, linked lists, trees and graphs, and support optional data grouping and metadata for each data element. The generated data files are compatible with JSON/UBJSON specifications and can be readily processed by most existing parsers. JData-defined annotation keywords also permit storage of strongly-typed binary data streams in JSON, data compression, linking and referencing.

JMesh is a JSON-based portable and extensible file format for the storage and interchange of unstructured geometric data, including discretized geometries such as triangular and tetrahedral meshes, parametric geometries such as NURBS curves and surfaces, and constructive geometries such as constructive solid geometry (CGS) of shape primitives and meshes. Built upon the JData specification, a JMesh file utilizes the JSON and Universal Binary JSON (UBJSON) constructs to serialize and encode geometric data structures, therefore, it can be directly processed by most existing JSON and UBJSON parsers. The JMesh specification defines a list of JSON-compatible constructs to encode geometric data, including N-dimensional (ND) vertices, curves, surfaces, solid elements, shape primitives, their interactions and spatial relations, together with their associated properties, such as numerical values, colors, normals, materials, textures and other properties related to graphics data manipulation, 3-D fabrication, computer graphics rendering and animations.

References

  1. "Value Types | Universal Binary JSON Specification" . Retrieved 20 July 2019.
  2. "Container Types | Universal Binary JSON Specification" . Retrieved 20 July 2019.
  3. "Binary Data | Universal Binary JSON Specification" . Retrieved 20 July 2019.
  4. 1 2 3 "UBJSON (.ubj)Wolfram Language Documentation" . Retrieved 20 July 2019.
  5. "UBJSON Storage Format" . Retrieved 20 July 2019.