XML data binding

Last updated

XML data binding refers to a means of representing information in an XML document as a business object in computer memory. This allows applications to access the data in the XML from the object, rather than using the DOM or SAX to retrieve the data from a direct representation of the XML itself.

Contents

Description

An XML data binder accomplishes this by automatically creating a mapping between elements of the XML schema of the document we wish to bind and members of a class to be represented in memory.

When this process is applied to convert an XML document to an object, it is called unmarshalling (also called deserialization). The reverse process, to serialize an object as XML, is called marshalling.

Approaches to data binding can be distinguished as follows:

Difficulties

Since XML is a document-oriented format and objects are (usually) not document-oriented, simple XML data binding mappings may ignore some of the structural information embedded in an XML document. Specifically, information such as comments, XML entity references, and sibling order may not be preserved in the object representation created by the binding application. However, this is not always the case; sufficiently powerful XML data binding tools are capable of preserving 100% of the information stored in an XML document.

Similarly, since objects residing in computer memory are not inherently sequentially stored, and may include links to other objects (including self-referential links), simple XML data binding mappings may not be capable of preserving all the information about an object when it is marshalled to XML. However, sufficiently powerful data binding tools perform graph structure analysis on objects residing in memory to marshall (cyclic) object graph structures in XML by utilizing standard XML reference attributes.

Alternatives

An alternative approach to automatic data binding relies instead on hand-crafted XPath expressions that extract data from XML. This approach has some benefits but also has some drawbacks. First, the approach only needs proximate knowledge (e.g., topology, tag names, etc.) of the XML tree structure, which developers can determine by looking at the XML data. Furthermore, XPath allows the application to bind the relevant data items and filter out everything else, avoiding the unnecessary processing that would be required to completely unmarshall the entire XML document. The drawback of this approach is the lack of automation in implementing the object model and XPath expressions. Instead, the application developers have to create these artifacts manually, which is time-consuming, potentially error-prone, and hampers application maintenance when XML schemas and XML content models are updated. Another drawback is the lack of XML schema verification, which XML data bindings typically apply automatically during unmarshalling. Schema validity is typically required in secure applications.

Data binding in general

One of XML data binding's strengths is the ability to deserialize objects across programs, languages, and platforms. [1] You can dump a time series of structured objects from a datalogger written in C (programming language) on an embedded processor, bring it across the network to process in Perl and finally visualize in Octave. The structure and the data remain consistent and coherent throughout the journey, and no custom formats or parsing is required. This is not unique to XML. YAML, for example, is emerging as a powerful data-binding alternative to XML. JSON (which can be regarded as a subset of YAML) is often suitable for lightweight or restricted applications.

XML data binding frameworks

NameProgramming LanguageLicenseFirst releaseLast stable releaseCode generation from XSDCustom mappingNote
Apache Commons BetwixtJava Apache January 28, 20030.8Un­knownUn­knownDormant. Serializes objects to XML without requiring an XML schema definition
Apache XMLBeans Java Apache License 2.0 5.1.1, August 29, 2022YesUn­known
Castor Java Apache 2.0 1.4.1, May 15, 2016Un­knownUn­knownEarlier versions also supported Java-to-SQL persistence but this has since been forked into a separate project
CodeSynthesis XSD C++ GNU GPL and proprietary4.0.0, July 22, 2014Un­knownUn­knownwith SAX or tree-like mapping into C++ classes
gSOAP C and C++ GNU GPL and proprietaryDecember 8, 2000;23 years ago2.8.131, September 23, 2023YesYesSupports XML schema, WSDL, and SOAP; XML schemas are not required to serialize C/C++ data to XML; custom mapping of XML schema types to C/C++ types via a type mapping file and from C/C++ types to compatible XML schema types by source code annotation
Java Architecture for XML Binding (JAXB)Java ?YesYes
JiBX Java BSD License 1.2.6, January 1, 2015YesYesMaps classes to XML schemas via bytecode manipulation
Simple Java Apache 2.0 2.7.1, February 9, 2017NoYes
System.Xml.SerializationC# ?YesNoPart of the .NET framework, contains XML data binding classes; includes xsd.exe tool to generate classes from XSD schema
xmlbeansxx C++ Apache 2.0 0.9.1, April 1, 2008Un­knownUn­knownC++ port of Apache XMLBeans
XStream Java BSD-style licenseJanuary 1, 2004;20 years ago1.4.10, May 23, 2017Un­knownUn­knownAlso capable of serializing to JSON
Zeus Java ?3.5 beta, August 16, 2002Un­knownUn­known

See also

Related Research Articles

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

YAML is a human-readable data serialization language. It is commonly used for configuration files and in applications where data are being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

Core Data is an object graph and persistence framework provided by Apple in the macOS and iOS operating systems. It was introduced in Mac OS X 10.4 Tiger and iOS with iPhone SDK 3.0. It allows data organized by the relational entity–attribute model to be serialized into XML, binary, or SQLite stores. The data can be manipulated using higher level objects representing entities and their relationships. Core Data manages the serialized version, providing object lifecycle and object graph management, including persistence. Core Data interfaces directly with SQLite, insulating the developer from the underlying SQL.

XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.

OGDL, is a "structured textual format that represents information in the form of graphs, where the nodes are strings and the arcs or edges are spaces or indentation."

Javolution is a real-time library aiming to make Java or Java-Like/C++ applications faster and more time predictable. Indeed, time-predictability can easily be ruined by the use of the standard library which is not acceptable for safety-critical systems. The open source Javolution library addresses these concerns for the Java platform and native applications. It provides numerous high-performance classes and utilities useful to non real-time applications as well. Such as:

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

CodeSynthesis XSD/e is a validating XML parser/serializer and C++ XML Data Binding generator for Mobile and Embedded systems. It is developed by Code Synthesis and dual-licensed under the GNU GPL and a proprietary license.

In computer science, marshalling or marshaling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission, especially between different runtimes. It is typically used when data must be moved between different parts of a computer program or from one program to another.

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:

gSOAP is a C and C++ software development toolkit for SOAP/XML web services and generic XML data bindings. Given a set of C/C++ type declarations, the compiler-based gSOAP tools generate serialization routines in source code for efficient XML serialization of the specified C and C++ data structures. Serialization takes zero-copy overhead.

The Quake Markup Language (QuakeML) is a flexible, extensible and modular XML representation of seismological data which is intended to cover a broad range of fields of application in modern seismology.

References

  1. "What is XML binding" IBM. Retrieved 2024-04-16.