Binary XML

Last updated

Various binary formats have been proposed as compact representations for XML (Extensible Markup Language). Using a binary XML format generally reduces the verbosity of XML documents thereby also reducing the cost of parsing, [1] but hinders the use of ordinary text editors and third-party tools to view and edit the document. There are several competing formats, but none has yet emerged as a de facto standard , although the World Wide Web Consortium adopted EXI as a Recommendation on 10 March 2011. [2]

Contents

Binary XML is typically used in applications where the performance of standard XML is insufficient, but the ability to convert the document to and from a form (XML) which is easily viewed and edited is valued. Other advantages may include enabling random access and indexing of XML documents.

The major challenge for binary XML is to create a single, widely adopted standard. The International Organization for Standardization (ISO) and the International Telecommunication Union (ITU) published the Fast Infoset standard in 2007 and 2005, respectively. Another standard (ISO/IEC 23001-1), known as Binary MPEG format for XML (BiM), has been standardized by the ISO in 2001. BiM is used by many ETSI standards for digital TV and mobile TV. The Open Geospatial Consortium provides a Binary XML Encoding Specification (currently a Best Practice Paper) optimized for geo-related data (GML) and also a benchmark to compare performance of Fast InfoSet, EXI, BXML and deflate to encode/decode AIXM. [3]

Alternatives to binary XML include using traditional file compression methods on XML documents (for example gzip); or using an existing standard such as ASN.1. Traditional compression methods, however, offer only the advantage of reduced file size, without the advantage of decreased parsing time or random access. ASN.1/PER forms the basis of Fast Infoset, which is one binary XML standard. There are also hybrid approaches (e.g., VTD-XML) that attach a small index file to an XML document to eliminate the overhead of parsing. [4]

Binary XML Efforts

Projects and file formats related to the notion of binary XML include:

Other projects that have functionality related to (or competing with) binary representations include:

See also

Related Research Articles

Moving Picture Experts Group Alliance of working groups to set standards for multimedia coding

The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics and genomic data, and transmission and file formats for various applications. Together with JPEG, MPEG is organized under ISO/IEC JTC 1/SC 29 – Coding of audio, picture, multimedia and hypermedia information.

Standard Generalized Markup Language Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

XML Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

JPEG 2000 Image compression standard and coding system

JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi, with the intention of superseding their original JPEG standard, which is based on a discrete cosine transform (DCT), with a newly designed, wavelet-based method. The standardized filename extension is .jp2 for ISO/IEC 15444-1 conforming files and .jpx for the extended part-2 specifications, published as ISO/IEC 15444-2. The registered MIME types are defined in RFC 3745. For ISO/IEC 15444-1 it is image/jp2.

Advanced Video Coding Most widely used standard for video compression

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports resolutions up to and including 8K UHD.

A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats.

The Extensible MPEG-4 Textual Format (XMT) is a high-level, XML-based file format for storing MPEG-4 data in a way suitable for further editing. In contrast, the more common MPEG-4 Part 14 (MP4) format is less flexible and used for distributing finished content.

Fast Infoset is an international standard that specifies a binary encoding format for the XML Information Set as an alternative to the XML document format. It aims to provide more efficient serialization than the text-based XML format.

MPEG-4 Part 11Scene description and application engine was published as ISO/IEC 14496-11 in 2005. MPEG-4 Part 11 is also known as BIFS, XMT, MPEG-J. It defines:

XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by the Ecma, and by the ISO and IEC in later versions.

The Video Coding Experts Group or Visual Coding Experts Group is a working group of the ITU Telecommunication Standardization Sector (ITU-T) concerned with video coding standards. It is responsible for standardization of the "H.26x" line of video coding standards, the "T.8xx" line of image coding standards, and related technologies.

BiM is an international standard defining a generic binary format for encoding XML documents.

Efficient XML Interchange (EXI) is a binary XML format for exchange of data on a computer network. It was developed by the W3C's Efficient Extensible Interchange Working Group and is one of the most prominent efforts to encode XML documents in a binary data format, rather than plain text. Using EXI format reduces the verbosity of XML documents as well as the cost of parsing. Improvements in the performance of writing (generating) content depends on the speed of the medium being written to, the methods and quality of actual implementations. EXI is useful for

ISO/IEC base media file format (ISOBMFF) defines a general structure for time-based multimedia files such as video and audio. It is standardized in ISO/IEC 14496-12 – MPEG-4 Part 12. The text was also published as ISO/IEC 15444-12.

This is a comparison of data-serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulae, graphics, bibliographies etc.

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:

Motion JPEG 2000 is a file format for motion sequences of JPEG 2000 images and associated audio, based on the MP4 and QuickTime format. Filename extensions for Motion JPEG 2000 video files are .mj2 and .mjp2, as defined in RFC 3745.

References

  1. The performance woe of binary XML http://webservices.sys-con.com/read/250512.htm Archived 2008-05-20 at the Wayback Machine
  2. John Schneider, Takuki Kamiya, eds., "Efficient XML Interchange (EXI) Format 1.0", W3C Recommendation 10 March 2011
  3. AIXM 5.1 compression benchmarking : how EXI, FI, BXML and deflate compete when dealing with geo-related data ?
  4. "Index XML documents with VTD-XML". Archived from the original on 2008-07-04. Retrieved 2007-11-28.
  5. "Where is Android binary XML format documented?". Reverse Engineering Stack Exchange.