This is a comparison of data serialization formats , various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Name | Creator-maintainer | Based on | Standardized?[ definition needed ] | Specification | Binary? | Human-readable? | Supports references? e | Schema-IDL? | Standard APIs | Supports zero-copy operations |
---|---|---|---|---|---|---|---|---|---|---|
Apache Arrow | Apache Software Foundation | — | De facto | Arrow Columnar Format | Yes | No | Yes | Built-in | C, C++, C#, Go, Java, JavaScript, Julia, Matlab, Python, R, Ruby, Rust, Swift | Yes |
Apache Avro | Apache Software Foundation | — | No | Apache Avro™ Specification | Yes | Partial g | — | Built-in | C, C#, C++, Java, PHP, Python, Ruby | — |
Apache Parquet | Apache Software Foundation | — | No | Apache Parquet | Yes | No | No | — | Java, Python, C++ | No |
Apache Thrift | Facebook (creator) Apache (maintainer) | — | No | Original whitepaper | Yes | Partial c | No | Built-in | C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, Delphi and other languages [1] | — |
ASN.1 | ISO, IEC, ITU-T | — | Yes | ISO/IEC 8824 / ITU-T X.680 (syntax) and ISO/IEC 8825 / ITU-T X.690 (encoding rules) series. X.680, X.681, and X.683 define syntax and semantics. | BER, DER, PER, OER, or custom via ECN | XER, JER, GSER, or custom via ECN | Yes f | Built-in | — | OER |
Bencode | Bram Cohen (creator) BitTorrent, Inc. (maintainer) | — | De facto as BEP | Part of BitTorrent protocol specification | Except numbers and delimiters, being ASCII | No | No | No | No | No |
BSON | MongoDB | JSON | No | BSON Specification | Yes | No | No | No | No | No |
Cap'n Proto | Kenton Varda | — | No | Cap'n Proto Encoding Spec | Yes | Partial h | No | Yes | No | Yes |
CBOR | Carsten Bormann, P. Hoffman | MessagePack [2] | Yes | RFC 8949 | Yes | No | Yes, through tagging | CDDL | FIDO2 | No |
Comma-separated values (CSV) | RFC author: Yakov Shafranovich | — | Myriad informal variants | RFC 4180 (among others) | No | Yes | No | No | No | No |
Common Data Representation (CDR) | Object Management Group | — | Yes | General Inter-ORB Protocol | Yes | No | Yes | Yes | Ada, C, C++, Java, Cobol, Lisp, Python, Ruby, Smalltalk | — |
D-Bus Message Protocol | freedesktop.org | — | Yes | D-Bus Specification | Yes | No | No | Partial (Signature strings) | Yes | — |
Efficient XML Interchange (EXI) | W3C | XML, Efficient XML | Yes | Efficient XML Interchange (EXI) Format 1.0 | Yes | XML | XPointer, XPath | XML Schema | DOM, SAX, StAX, XQuery, XPath | — |
Extensible Data Notation (edn) | Rich Hickey / Clojure community | Clojure | Yes | Official edn spec | No | Yes | No | No | Clojure, Ruby, Go, C++, Javascript, Java, CLR, ObjC, Python [3] | No |
FlatBuffers | — | No | Flatbuffers GitHub | Yes | Apache Arrow | Partial (internal to the buffer) | Yes | C++, Java, C#, Go, Python, Rust, JavaScript, PHP, C, Dart, Lua, TypeScript | Yes | |
Fast Infoset | ISO, IEC, ITU-T | XML | Yes | ITU-T X.891 and ISO/IEC 24824-1:2007 | Yes | No | XPointer, XPath | XML schema | DOM, SAX, XQuery, XPath | — |
FHIR | Health Level 7 | REST basics | Yes | Fast Healthcare Interoperability Resources | Yes | Yes | Yes | Yes | Hapi for FHIR [4] JSON, XML, Turtle | No |
Ion | Amazon | JSON | No | The Amazon Ion Specification | Yes | Yes | No | Ion schema | C, C#, Go, Java, JavaScript, Python, Rust | — |
Java serialization | Oracle Corporation | — | Yes | Java Object Serialization | Yes | No | Yes | No | Yes | — |
JSON | Douglas Crockford | JavaScript syntax | Yes | STD 90/RFC 8259 (ancillary: RFC 6901, RFC 6902), ECMA-404, ISO/IEC 21778:2017 | No, but see BSON, Smile, UBJSON | Yes | JSON Pointer (RFC 6901), or alternately, JSONPath, JPath, JSPON, json:select(); and JSON-LD | Partial (JSON Schema Proposal, ASN.1 with JER, Kwalify Archived 2021-08-12 at the Wayback Machine , Rx, JSON-LD | Partial (Clarinet, JSONQuery / RQL, JSONPath), JSON-LD | No |
MessagePack | Sadayuki Furuhashi | JSON (loosely) | No | MessagePack format specification | Yes | No | No | No | No | Yes |
Netstrings | Dan Bernstein | — | No | netstrings.txt | Except ASCII delimiters | Yes | No | No | No | Yes |
OGDL | Rolf Veen | ? | No | Specification | Binary specification | Yes | Path specification | Schema WD | — | |
OPC-UA Binary | OPC Foundation | — | No | opcfoundation.org | Yes | No | Yes | No | No | — |
OpenDDL | Eric Lengyel | C, PHP | No | OpenDDL.org | No | Yes | Yes | No | OpenDDL library | — |
PHP serialization format | PHP Group | — | Yes | No | Yes | Yes | Yes | No | Yes | — |
Pickle (Python) | Guido van Rossum | Python | De facto as PEPs | PEP 3154 – Pickle protocol version 4 | Yes | No | Yes [5] | No | Yes | No |
Property list | NeXT (creator) Apple (maintainer) | ? | Partial | Public DTD for XML format | Yes a | Yes b | No | ? | Cocoa, CoreFoundation, OpenStep, GnuStep | No |
Protocol Buffers (protobuf) | — | No | Developer Guide: Encoding, proto2 specification, and proto3 specification | Yes | Yes d | No | Built-in | C++, Java, C#, Python, Go, Ruby, Objective-C, C, Dart, Perl, PHP, R, Rust, Scala, Swift, Julia, Erlang, D, Haskell, ActionScript, Delphi, Elixir, Elm, Erlang, GopherJS, Haskell, Haxe, JavaScript, Kotlin, Lua, Matlab, Mercurt, OCaml, Prolog, Solidity, Typescript, Vala, Visual Basic | No | |
S-expressions | John McCarthy (original) Ron Rivest (internet draft) | Lisp, Netstrings | Largely de facto | "S-Expressions" Archived 2013-10-07 at the Wayback Machine Internet Draft | Yes, canonical representation | Yes, advanced transport representation | No | No | — | |
Smile | Tatu Saloranta | JSON | No | Smile Format Specification | Yes | No | Yes | Partial (JSON Schema Proposal, other JSON schemas/IDLs) | Partial (via JSON APIs implemented with Smile backend, on Jackson, Python) | — |
SOAP | W3C | XML | Yes | W3C Recommendations: SOAP/1.1 SOAP/1.2 | Partial ( Efficient XML Interchange , Binary XML , Fast Infoset , MTOM, XSD base64 data) | Yes | Built-in id/ref, XPointer, XPath | WSDL, XML schema | DOM, SAX, XQuery, XPath | — |
Structured Data eXchange Formats | Max Wildgrube | — | Yes | RFC 3072 | Yes | No | No | No | — | |
UBJSON | The Buzz Media, LLC | JSON, BSON | No | ubjson.org | Yes | No | No | No | No | — |
eXternal Data Representation (XDR) | Sun Microsystems (creator) IETF (maintainer) | — | Yes | STD 67/RFC 4506 | Yes | No | Yes | Yes | Yes | — |
XML | W3C | SGML | Yes | W3C Recommendations: 1.0 (Fifth Edition) 1.1 (Second Edition) | Partial ( Efficient XML Interchange , Binary XML , Fast Infoset , XSD base64 data) | Yes | XPointer, XPath | XML schema, RELAX NG | DOM, SAX, XQuery, XPath | — |
XML-RPC | Dave Winer [6] | XML | No | XML-RPC Specification | No | Yes | No | No | No | No |
YAML | Clark Evans, Ingy döt Net, and Oren Ben-Kiki | C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON [7] | No | Version 1.2 | No | Yes | Yes | Partial (Kwalify Archived 2021-08-12 at the Wayback Machine , Rx, built-in language type-defs) | No | No |
Name | Creator-maintainer | Based on | Standardized? | Specification | Binary? | Human-readable? | Supports references? e | Schema-IDL? | Standard APIs | Supports zero-copy operations |
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
---|---|---|---|---|---|---|---|---|
ASN.1 (XML Encoding Rules) | <foo /> | <foo>true</foo> | <foo>false</foo> | <foo>685230</foo> | <foo>6.8523015e+5</foo> | <foo>A to Z</foo> | <SeqOfUnrelatedDatatypes><isMarried>true</isMarried><hobby/><velocity>-42.1e7</velocity><bookname>AtoZ</bookname><bookname>Wesaid,"no".</bookname></SeqOfUnrelatedDatatypes> | An object (the key is a field name): <person><isMarried>true</isMarried><hobby/><height>1.85</height><name>BobPeterson</name></person> A data mapping (the key is a data value): <competition><measurement><name>John</name><height>3.14</height></measurement><measurement><name>Jane</name><height>2.718</height></measurement></competition> |
CSV b | null a (or an empty element in the row) a | 1 a true a | 0 a false a | 685230 -685230 a | 6.8523015e+5 a | A to Z "We said, ""no""." | true,,-42.1e7,"A to Z" | 42,1 A to Z,1,2,3 |
edn | nil | true | false | 685230 -685230 | 6.8523015e+5 | "A to Z" , "A \"up to\" Z" | [true nil -42.1e7 "A to Z"] | {:kw 1, "42" true, "A to Z" [1 2 3]} |
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
Ion |
| true | false | 685230 -685230 0xA74AE 0b111010010101110 | 6.8523015e5 | "A to Z" ''' | [true,null,-42.1e7,"A to Z"] | {'42':true,'A to Z':[1,2,3]} |
Netstrings c | 0:, a 4:null, a | 1:1, a 4:true, a | 1:0, a 5:false, a | 6:685230, a | 9:6.8523e+5, a | 6:A to Z, | 29:4:true,0:,7:-42.1e7,6:A to Z,, | 41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,, a |
JSON | null | true | false | 685230 -685230 | 6.8523015e+5 | "A to Z" | [true,null,-42.1e7,"A to Z"] | {"42":true,"A to Z":[1,2,3]} |
OGDL [ verification needed ] | null a | true a | false a | 685230 a | 6.8523015e+5 a | "A to Z" 'A to Z' NoSpaces | true null -42.1e7 "A to Z"
| 42 true "A to Z" 1 2 3 42 true "A to Z", (1, 2, 3) |
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
OpenDDL | ref {null} | bool {true} | bool {false} | int32 {685230} int32 {0x74AE} int32 {0b111010010101110} | float {6.8523015e+5} | string {"A to Z"} | Homogeneous array: int32 {1, 2, 3, 4, 5} Heterogeneous array: array { bool {true} ref {null} float {-42.1e7} string {"A to Z"} } | dict { value (key = "42") {bool {true}} value (key = "A to Z") {int32 {1, 2, 3}} } |
PHP serialization format | N; | b:1; | b:0; | i:685230; i:-685230; | d:685230.15; d d:INF; d:-INF; d:NAN; | s:6:"A to Z"; | a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";} | Associative array:a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}} Object: O:8:"stdClass":2:{s:4:"John";d:3.14;s:4:"Jane";d:2.718;} d |
Pickle (Python) | N. | I01\n. | I00\n. | I685230\n. | F685230.15\n. | S'A to Z'\n. | (lI01\na(laF-421000000.0\naS'A to Z'\na. | (dI42\nI01\nsS'A to Z'\n(lI1\naI2\naI3\nas. |
Property list (plain text format) [11] | — | <*BY> | <*BN> | <*I685230> | <*R6.8523015e+5> | "A to Z" | ( <*BY>, <*R-42.1e7>, "A to Z" ) | { "42" = <*BY>; "A to Z" = ( <*I1>, <*I2>, <*I3> ); } |
Property list (XML format) [12] | — | <true /> | <false /> | <integer>685230</integer> | <real>6.8523015e+5</real> | <string>A to Z</string> | <array><true/><real>-42.1e7</real><string>AtoZ</string></array> | <dict><key>42</key><true/><key>AtoZ</key><array><integer>1</integer><integer>2</integer><integer>3</integer></array></dict> |
Protocol Buffers | — | true | false | 685230 -685230 | 20.0855369 | "A to Z" | field1: "value1" field1: "value2" field1: "value3 anotherfield { foo: 123 bar: 456 } anotherfield { foo: 222 bar: 333 } | thing1:"blahblah"thing2:18923743thing3:-44thing4{submessage_field1:"foo"submessage_field2:false}enumeratedThing:SomeEnumeratedValuething5:123.456[extensionFieldFoo]:"etc"[extensionFieldThatIsAnEnum]:EnumValue |
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
S-expressions | NIL nil | T #t f true | NIL #f f false | 685230 | 6.8523015e+5 | abc "abc" #616263# 3:abc {MzphYmM=} |YWJj| | (T NIL -42.1e7 "A to Z") | ((42 T) ("A to Z" (1 2 3))) |
YAML | ~ null Null NULL [13] | y Y yes Yes YES on On ON true True TRUE [14] | n N no No NO off Off OFF false False FALSE [14] | 685230 +685_230 -685230 02472256 0x_0A_74_AE 0b1010_0111_0100_1010_1110 190:20:30 [15] | 6.8523015e+5 685.230_15e+03 685_230.15 190:20:30.15 .inf -.inf .Inf .INF .NaN .nan .NAN [16] | A to Z "A to Z" 'A to Z' | [y, ~, -42.1e7, "A to Z"] - y - - -42.1e7 - A to Z | {"John":3.14, "Jane":2.718} 42: y A to Z: [1, 2, 3] |
XML e and SOAP | <null /> a | true | false | 685230 | 6.8523015e+5 | A to Z | <item>true</item><itemxsi:nil="true"/><item>-42.1e7</item><item>AtoZ<item> | <map><entrykey="42">true</entry><entrykey="A to Z"><itemval="1"/><itemval="2"/><itemval="3"/></entry></map> |
XML-RPC | <value><boolean>1</boolean></value> | <value><boolean>0</boolean></value> | <value><int>685230</int></value> | <value><double>6.8523015e+5</double></value> | <value><string>A to Z</string></value> | <value><array><data><value><boolean>1</boolean></value><value><double>-42.1e7</double></value><value><string>AtoZ</string></value></data></array></value> | <value><struct><member><name>42</name><value><boolean>1</boolean></value></member><member><name>AtoZ</name><value><array><data><value><int>1</int></value><value><int>2</int></value><value><int>3</int></value></data></array></value></member></struct> |
Format | Null | Booleans | Integer | Floating-point | String | Array | Associative array/object |
---|---|---|---|---|---|---|---|
ASN.1 (BER, PER or OER encoding) | NULL type | BOOLEAN:
| INTEGER:
| REAL:
| Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) | Data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) | User definable type |
BSON | \x0A (1 byte) | True: \x08\x01 False: \x08\x00 (2 bytes) | int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement | Double: little-endian binary64 | UTF-8-encoded, preceded by int32-encoded string length in bytes | BSON embedded document with numeric keys | BSON embedded document |
Concise Binary Object Representation (CBOR) | \xf6 (1 byte) |
(1 byte) |
|
|
|
|
|
Efficient XML Interchange (EXI) [a] (Unpreserved lexical values format) | xsi:nil is not allowed in binary context. | 1–2 bit integer interpreted as boolean. | Boolean sign, plus arbitrary length 7-bit octets, parsed until most-significant bit is 0, in little-endian. The schema can set the zero-point to any arbitrary number. Unsigned skips the boolean flag. |
| Length prefixed integer-encoded Unicode. Integers may represent enumerations or string table entries instead. | Length prefixed set of items. | Not in protocol. |
FlatBuffers | Encoded as absence of field in parent object |
(1 byte) | Little-endian 2's complement signed and unsigned 8/16/32/64 bits | UTF-8-encoded, preceded by 32-bit integer length of string in bytes | Vectors of any other type, preceded by 32-bit integer length of number of elements | Tables (schema defined types) or Vectors sorted by key (maps / dictionaries) | |
Ion [18] | \x0f [b] |
|
|
|
| \xbx Arbitrary length and overhead. Length in octets. |
|
MessagePack | \xc0 |
|
| Typecode (1 byte) + IEEE single/double |
encoding is unspecified [19] |
|
|
Netstrings [c] | Not in protocol. | Not in protocol. | Not in protocol. | Not in protocol. | Length-encoded as an ASCII string + ':' + data + ',' Length counts only octets between ':' and ',' | Not in protocol. | Not in protocol. |
OGDL Binary | |||||||
Property list (binary format) | |||||||
Protocol Buffers |
| UTF-8-encoded, preceded by varint-encoded integer length of string in bytes | Repeated value with the same tag or, for varint-encoded integers only, values packed contiguously and prefixed by tag and total byte length | — | |||
Smile | \x21 |
|
| IEEE single/double, BigDecimal | Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references | Arbitrary-length heterogenous arrays with end-marker | Arbitrary-length key/value pairs with end-marker |
Structured Data eXchange Formats (SDXF) | Big-endian signed 24-bit or 32-bit integer | Big-endian IEEE double | Either UTF-8 or ISO 8859-1 encoded | List of elements with identical ID and size, preceded by array header with int16 length | Chunks can contain other chunks to arbitrary depth. | ||
Thrift |
In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.
Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.
YAML is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.
In computing, configuration files are files used to configure the parameters and initial settings for some computer programs or applications, server processes and operating system settings.
Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.
Various binary formats have been proposed as compact representations for XML. Using a binary XML format generally reduces the verbosity of XML documents thereby also reducing the cost of parsing, but hinders the use of ordinary text editors and third-party tools to view and edit the document. There are several competing formats, but none has yet emerged as a de facto standard, although the World Wide Web Consortium adopted EXI as a Recommendation on 10 March 2011.
JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of name–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.
In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist
, and thus are often referred to as p-list files.
XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.
Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.
Efficient XML Interchange (EXI) is a binary XML format for exchange of data on a computer network. It was developed by the W3C's Efficient Extensible Interchange Working Group and is one of the most prominent efforts to encode XML documents in a binary data format, rather than plain text. Using EXI format reduces the verbosity of XML documents as well as the cost of parsing. Improvements in the performance of writing (generating) content depends on the speed of the medium being written to, the methods and quality of actual implementations. EXI is useful for
Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages: one for human editing and another which is more machine-readable based on JSON.
Data Format Description Language is a modeling language for describing general text and binary data in a standard way. It was published as an Open Grid Forum Recommendation in February 2021, and in April 2024 was published as an ISO standard.
MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages, some official libraries and others community created, such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.
Smile is a computer data interchange format based on JSON. It can also be considered a binary serialization of the generic JSON data model, which means tools that operate on JSON may be used with Smile as well, as long as a proper encoder/decoder exists for the tool. The name comes from the first 2 bytes of the 4 byte header, which consist of Smiley ":)" followed by a linefeed: a choice made to make it easier to recognize Smile-encoded data files using textual command-line tools.
JData is a light-weight data annotation and exchange open-standard designed to represent general-purpose and scientific data structures using human-readable (text-based) JSON and (binary) UBJSON formats. JData specification specifically aims at simplifying exchange of hierarchical and complex data between programming languages, such as MATLAB, Python, JavaScript etc. It defines a comprehensive list of JSON-compatible "name":value
constructs to store a wide range of data structures, including scalars, N-dimensional arrays, sparse/complex-valued arrays, maps, tables, hashes, linked lists, trees and graphs, and support optional data grouping and metadata for each data element. The generated data files are compatible with JSON/UBJSON specifications and can be readily processed by most existing parsers. JData-defined annotation keywords also permit storage of strongly-typed binary data streams in JSON, data compression, linking and referencing.