XML-binary Optimized Packaging

Last updated

XML-binary Optimized Packaging (XOP) is a mechanism defined for the serialization of XML Information Sets (infosets) that contain binary data, as well as deserialization back into the XML Information Set.

Contents

Benefits

XOP allows the binary data part of an XML Infoset to be serialized without going through the XML serializer. The XML serialization of an XML Infoset is text based, so any binary data will need to be encoded using base64. Using XOP avoids this by extracting the binary data out of the XML Infoset so that the XML Infoset does not contain binary data and the binary data can be serialized differently.

Therefore, XOP can reduce the size of the serialization (since base64 encoding has approximately a 33% size overhead) and (depending on how it is implemented) might allow processing efficiencies. This size increase results in extra resources needed to transmit or store the data.

Costs

XOP introduces another level of processing. Therefore, it introduces extra complexity and processing overheads.

The representation of the XOP packages introduces some overhead. These are negligible when the binary data is large, but could be significant if the binary data is small.

Operation

XOP operates on a single XML Infoset.

The binary parts of the original XML infoset are extracted out, leaving an "XOP Infoset" (which is essentially the original XML Infoset with the binary parts replaced by external references). The references in the XOP Infoset are represented using the "xop:Include" element. The XOP Infoset plus the extracted content can be serialized into a representation called the "XOP Package". The XOP Package can be sent or stored.

To reconstitute the XML Infoset, the XOP Package is deserialized into the XOP Infoset plus the extracted content, and then the extracted content is put back into the XML Infoset.

XOP Packages

XOP can be used with a number of different packaging mechanisms. A packaging mechanism defines how the XOP Infoset and the binary chunks are represented.

The XOP specification defines how MIME can be used as a packaging mechanism. When used with MIME, the XOP Infoset is represented as XML in the root MIME part, and the binary chunks are represented in the other MIME parts. Those other MIME parts can be serialized as binary data, avoiding the need to base64 encode them if they were left inside the XML Infoset.

XOP does not mandate the use of the MIME packaging mechanism, so other packaging mechanisms could be used.

Usage in SOAP Web services

The MIME packaging mechanism is the most widely used, since XOP is usually used to represent SOAP messages with MTOM.

For example:

<syntaxhighlight lang="email"> MIME-Version: 1.0 Content-Type: Multipart/Related;boundary=MIME_boundary; ... --MIME_boundary Content-Type: application/xop+xml;  </syntaxhighlight>... <syntaxhighlight lang="xml"> <soap:Envelope> ...  <soap:Body> ...      <m:photo xmlmime:contentType="image/png">        <xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include"            href="cid:http://example.org/me.png"/></m:photo> </syntaxhighlight>... <syntaxhighlight lang="email"> --MIME_boundary Content-Type: image/png Content-Transfer-Encoding: binary Content-ID: <http://example.org/me.png> </syntaxhighlight>// binary octets for png

See also

Related Research Articles

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

Portable Network Graphics Family of lossless compression file formats for image files

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF).

In computing, serialization is the process of translating data structures or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

SOAP Messaging protocol for web services

SOAP is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks. Its purpose is to provide extensibility, neutrality, verbosity and independence. It uses XML Information Set for its message format, and relies on application layer protocols, most often Hypertext Transfer Protocol (HTTP), although some legacy systems communicate over Simple Mail Transfer Protocol (SMTP), for message negotiation and transmission.

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. It is defined as a MIME content transfer encoding for use in e-mail.

XML Signature defines an XML syntax for digital signatures and is defined in the W3C recommendation XML Signature Syntax and Processing. Functionally, it has much in common with PKCS#7 but is more extensible and geared towards signing XML documents. It is used by various Web technologies such as SOAP, SAML, and others.

Direct Internet Message Encapsulation (DIME) was a Microsoft-proposed internet standard in the early 2000s for the streaming of binary and other encapsulated data over the Internet.

SDXF is a data serialization format defined by RFC 3072. It allows arbitrary structured data of different types to be assembled in one file for exchanging between arbitrary computers.

Various binary formats have been proposed as compact representations for XML. Using a binary XML format generally reduces the verbosity of XML documents thereby also reducing the cost of parsing, but hinders the use of ordinary text editors and third-party tools to view and edit the document. There are several competing formats, but none has yet emerged as a de facto standard, although the World Wide Web Consortium adopted EXI as a Recommendation on 10 March 2011.

Fast Infoset is an international standard that specifies a binary encoding format for the XML Information Set as an alternative to the XML document format. It aims to provide more efficient serialization than the text-based XML format.

XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.

MTOM is the W3C Message Transmission Optimization Mechanism, a method of efficiently sending binary data to and from Web services.

OPC Unified Architecture is a machine to machine communication protocol for industrial automation developed by the OPC Foundation. Distinguishing characteristics are:

The Open Packaging Conventions (OPC) is a container-file technology initially created by Microsoft to store a combination of XML and non-XML files that together form a single entity such as an Open XML Paper Specification (OpenXPS) document. OPC-based file formats combine the advantages of leaving the independent file entities embedded in the document intact and resulting in much smaller files compared to normal use of XML.

In computer science, marshalling or marshaling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission, and it is typically used when data must be moved between different parts of a computer program or from one program to another. Marshalling is similar to serialization and is used to communicate to remote objects with an object, in this case a serialized object. It simplifies complex communication, using composite objects in order to communicate instead of primitives. The inverse of marshalling is called unmarshalling.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

This is a comparison of data-serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:

gSOAP is a C and C++ software development toolkit for SOAP/XML web services and generic XML data bindings. Given a set of C/C++ type declarations, the compiler-based gSOAP tools generate serialization routines in source code for efficient XML serialization of the specified C and C++ data structures. Serialization takes zero-copy overhead.

References