MessagePack

Last updated
MessagePack
Original author(s) Sadayuki Furuhashi
Repository
Written inVarious languages
Operating system Any
Platform Cross-platform
Type Data interchange
License Boost Software License
Website msgpack.org

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages, some official libraries and others community created, such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift. [1]

Contents

Data types and syntax

Data structures processed by MessagePack loosely correspond to those used in JSON format. They consist of the following element types:

Comparison to other formats

MessagePack is more compact than JSON, but imposes limitations on array and integer sizes. On the other hand, it allows binary data and non-UTF-8 encoded strings. In JSON, map keys have to be strings, but in MessagePack there is no such limitation and any type can be a map key, including types like maps and arrays, and, like YAML, numbers.

Compared to BSON, MessagePack is more space-efficient. BSON is designed for fast in-memory manipulation, whereas MessagePack is designed for efficient transmission over the wire. For example, BSON requires null terminators at the end of all strings and inserts string indexes for list elements, while MessagePack doesn't. BSON represents both arrays and maps internally as documents, which are maps, where an array is a map with keys as decimal strings counting up from 0. MessagePack on the other hand represents both maps and arrays as arrays, where each map key-value pair is contiguous, making odd items keys and even items values.

The Protocol Buffers format provides a significantly more compact transmission format than MessagePack because it doesn't transmit field names. However, while JSON and MessagePack aim to serialize arbitrary data structures with type tags, Protocol Buffers requires a schema to define the data types. Protocol Buffers compiler creates boilerplate code in the target language to facilitate integration of serialization into the application code; MessagePack returns only a dynamically typed data structure and provides no automatic structure checks.

MessagePack is referenced in RFC   7049 of CBOR.

See also

Related Research Articles

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.

Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

<span class="mw-page-title-main">Interface description language</span> Computer language used to describe a software components interface

An interface description language or interface definition language (IDL) is a generic term for a language that lets a program or object written in one language communicate with another program written in an unknown language. IDLs are usually used to describe data types and interfaces in a language-independent way, for example, between those written in C++ and those written in Java.

YAML is a human-readable data serialization language. It is commonly used for configuration files and in applications where data are being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.

Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.

SDXF is a data serialization format defined by RFC 3072. It allows arbitrary structured data of different types to be assembled in one file for exchanging between arbitrary computers.

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

A Canonical S-expression is a binary encoding form of a subset of general S-expression. It was designed for use in SPKI to retain the power of S-expressions and ensure canonical form for applications such as digital signatures while achieving the compactness of a binary form and maximizing the speed of parsing.

Thrift is an interface definition language and binary communication protocol used for defining and creating services for programming languages. It was developed by Facebook. Since 2020, it is an open source project in the Apache Software Foundation.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

<span class="mw-page-title-main">Apache Avro</span> Open-source remote procedure call framework

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages: one for human editing and another which is more machine-readable based on JSON.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

Smile is a computer data interchange format based on JSON. It can also be considered a binary serialization of the generic JSON data model, which means tools that operate on JSON may be used with Smile as well, as long as a proper encoder/decoder exists for the tool. The name comes from the first 2 bytes of the 4 byte header, which consist of Smiley ":)" followed by a linefeed: a choice made to make it easier to recognize Smile-encoded data files using textual command-line tools.

Concise Binary Object Representation (CBOR) is a binary data serialization format loosely based on JSON authored by Carsten Bormann. Bormann maintains, that CBOR is not named after himself. Like JSON it allows the transmission of data objects that contain name–value pairs, but in a more concise manner. This increases processing and transfer speeds at the cost of human readability. It is defined in IETF RFC 8949.

JData is a light-weight data annotation and exchange open-standard designed to represent general-purpose and scientific data structures using human-readable (text-based) JSON and (binary) UBJSON formats. JData specification specifically aims at simplifying exchange of hierarchical and complex data between programming languages, such as MATLAB, Python, JavaScript etc. It defines a comprehensive list of JSON-compatible "name":value constructs to store a wide range of data structures, including scalars, N-dimensional arrays, sparse/complex-valued arrays, maps, tables, hashes, linked lists, trees and graphs, and support optional data grouping and metadata for each data element. The generated data files are compatible with JSON/UBJSON specifications and can be readily processed by most existing parsers. JData-defined annotation keywords also permit storage of strongly-typed binary data streams in JSON, data compression, linking and referencing.

References

  1. "Languages" . Retrieved 4 Jan 2022.