Smile (data interchange format)

Last updated
Smile
Filename extension
.sml
Internet media type
application/x-jackson-smile (proposed)
Magic number 3a 29 ":)"
Type of formatData interchange
Extended from JSON
Standard no RFC yet
Website github.com/FasterXML/smile-format-specification

Smile is a computer data interchange format based on JSON. It can also be considered a binary serialization of the generic JSON data model, which means tools that operate on JSON may be used with Smile as well, as long as a proper encoder/decoder exists for the tool. The name comes from the first 2 bytes of the 4 byte header, which consist of Smiley ":)" followed by a linefeed: a choice made to make it easier to recognize Smile-encoded data files using textual command-line tools.

Contents

Efficiency

Compared to JSON, Smile is both more compact and more efficient to process (both to read and write). [1] Part of this is due to more efficient binary encoding (similar to BSON, CBOR and UBJSON), but an additional feature is optional use of back references for property names and values. [2] Back referencing allows replacing of property names and/or short (64 bytes or less) String values with 1- or 2-byte reference ids.

Implementations

Libraries known to support Smile include:

See also

Related Research Articles

In computing, serialization or serialisation is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

External Data Representation (XDR) is a standard data serialization format, for uses such as computer network protocols. It allows data to be transferred between different kinds of computer systems. Converting from the local representation to XDR is called encoding. Converting from XDR to the local representation is called decoding. XDR is implemented as a software library of functions which is portable between different operating systems and is also independent of the transport layer.

Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It was developed at Facebook for "scalable cross-language services development" and as of 2020 is an open source project in the Apache Software Foundation.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

<span class="mw-page-title-main">Apache Avro</span> Open-source remote procedure call framework

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing and another which is more machine-readable based on JSON.

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Scala, Smalltalk, and Swift.

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

JSON streaming comprises communications protocols to delimit JSON objects built upon lower-level stream-oriented protocols, that ensures individual JSON objects are recognized, when the server and clients use the same one. This is necessary as JSON is a non-concatenative protocol.

Binn is a computer data serialization format used mainly for application data transfer. It stores primitive data types and data structures in a binary form.

Concise Binary Object Representation (CBOR) is a binary data serialization format loosely based on JSON authored by C. Bormann. Like JSON it allows the transmission of data objects that contain name–value pairs, but in a more concise manner. This increases processing and transfer speeds at the cost of human readability. It is defined in IETF RFC 8949.

FlatBuffers is a free software library implementing a serialization format similar to Protocol Buffers, Thrift, Apache Avro, SBE, and Cap'n Proto, primarily written by Wouter van Oortmerssen and open-sourced by Google. It supports “zero-copy” deserialization, so that accessing the serialized data does not require first copying it into a separate part of memory. This makes accessing data in these formats much faster than data in formats requiring more extensive processing, such as JSON, CSV, and in many cases Protocol Buffers. Compared to other serialization formats however, the handling of FlatBuffers requires usually more code, and some operations are not possible.

JData is a light-weight data annotation and exchange open-standard designed to represent general-purpose and scientific data structures using human-readable (text-based) JSON and (binary) UBJSON formats. JData specification specifically aims at simplifying exchange of hierarchical and complex data between programming languages, such as MATLAB, Python, JavaScript etc. It defines a comprehensive list of JSON-compatible "name":value constructs to store a wide range of data structures, including scalars, N-dimensional arrays, sparse/complex-valued arrays, maps, tables, hashes, linked lists, trees and graphs, and support optional data grouping and metadata for each data element. The generated data files are compatible with JSON/UBJSON specifications and can be readily processed by most existing parsers. JData-defined annotation keywords also permit storage of strongly-typed binary data streams in JSON, data compression, linking and referencing.

References

  1. "JVM Serializer Benchmark (results)" . Retrieved 7 Jun 2014.
  2. "Shared String References in Smile" . Retrieved 7 Jun 2014.