This article relies largely or entirely on a single source .(June 2015) |
XML data binding refers to a means of representing information in an XML document as a business object in computer memory. This allows applications to access the data in the XML from the object, rather than using the DOM or SAX to retrieve the data from a direct representation of the XML itself.
It makes it possible to read and write XML data using a programming language class library (e.g. C++, C#, Java), specifically created for a given XML data format. [1] Whilst it is possible to manually write a computer program to achieve this, XML data binding tools generate the source code to perform these tasks.
An XML data binder accomplishes this by automatically creating a mapping between elements of the XML schema of the document we wish to bind and members of a class to be represented in memory.
When this process is applied to convert an XML document to an object, it is called unmarshalling (also called deserialization). The reverse process, to serialize an object as XML, is called marshalling.
Approaches to data binding can be distinguished as follows:
Since XML is a document-oriented format and objects are (usually) not document-oriented, simple XML data binding mappings may ignore some of the structural information embedded in an XML document. Specifically, information such as comments, XML entity references, and sibling order may not be preserved in the object representation created by the binding application. However, this is not always the case; sufficiently powerful XML data binding tools are capable of preserving 100% of the information stored in an XML document.
Similarly, since objects residing in computer memory are not inherently sequentially stored, and may include links to other objects (including self-referential links), simple XML data binding mappings may not be capable of preserving all the information about an object when it is marshalled to XML. However, sufficiently powerful data binding tools perform graph structure analysis on objects residing in memory to marshall (cyclic) object graph structures in XML by utilizing standard XML reference attributes.
An alternative approach to automatic data binding relies instead on hand-crafted XPath expressions that extract data from XML. This approach has some benefits but also has some drawbacks. First, the approach only needs proximate knowledge (e.g., topology, tag names, etc.) of the XML tree structure, which developers can determine by looking at the XML data. Furthermore, XPath allows the application to bind the relevant data items and filter out everything else, avoiding the unnecessary processing that would be required to completely unmarshall the entire XML document. The drawback of this approach is the lack of automation in implementing the object model and XPath expressions. Instead, the application developers have to create these artifacts manually, which is time-consuming, potentially error-prone, and hampers application maintenance when XML schemas and XML content models are updated. Another drawback is the lack of XML schema verification, which XML data bindings typically apply automatically during unmarshalling. Schema validity is typically required in secure applications.
One of XML data binding's strengths is the ability to deserialize objects across programs, languages, and platforms. [2] You can dump a time series of structured objects from a datalogger written in C (programming language) on an embedded processor, bring it across the network to process in Perl and finally visualize in Octave. The structure and the data remain consistent and coherent throughout the journey, and no custom formats or parsing is required. This is not unique to XML. YAML, for example, is emerging as a powerful data-binding alternative to XML. JSON (which can be regarded as a subset of YAML) is often suitable for lightweight or restricted applications.
Name | Programming Language | License | First release | Last stable release | Code generation from XSD | Custom mapping | Note |
---|---|---|---|---|---|---|---|
Apache Commons Betwixt | Java | Apache | January 28, 2003 | 0.8 | Unknown | Unknown | Dormant. Serializes objects to XML without requiring an XML schema definition |
Apache XMLBeans | Java | Apache License 2.0 | 5.1.1, August 29, 2022 | Yes | Unknown | ||
Castor | Java | Apache 2.0 | 1.4.1, May 15, 2016 | Unknown | Unknown | Earlier versions also supported Java-to-SQL persistence but this has since been forked into a separate project | |
CodeSynthesis XSD | C++ | GNU GPL and proprietary | 4.0.0, July 22, 2014 | Unknown | Unknown | with SAX or tree-like mapping into C++ classes | |
gSOAP | C and C++ | GNU GPL and proprietary | December 8, 2000 | 2.8.131, September 23, 2023 | Yes | Yes | Supports XML schema, WSDL, and SOAP; XML schemas are not required to serialize C/C++ data to XML; custom mapping of XML schema types to C/C++ types via a type mapping file and from C/C++ types to compatible XML schema types by source code annotation |
Java Architecture for XML Binding (JAXB) | Java | ? | Yes | Yes | |||
JiBX | Java | BSD License | 1.2.6, January 1, 2015 | Yes | Yes | Maps classes to XML schemas via bytecode manipulation | |
Liquid XML Data Binder | C++, C#, Java, Visual Basic.Net, Visual Basic 6 (COM) | Freeware and proprietry | June 1, 2001 | June 18, 2024 | Yes | Yes | Supports XML schema (XSD), DTD, XDR, WSDL. Serializes XML to JSON and JSON to XML. |
Liquid XML Objects | C# and Visual Basic .Net (Supports XSD 1.1) | Freeware and proprietry | March 3, 2019 | June 18, 2024 | Yes | Yes | Direct replacement for XSD.exe. Integrated within Microsoft Visual Studio. Supports XML schema (XSD 1.0 and XSD 1.1), DTD, WSDL. Serializes XML to JSON and JSON to XML. |
Simple | Java | Apache 2.0 | 2.7.1, February 9, 2017 | No | Yes | ||
System.Xml.Serialization | C# | ? | Yes | No | Part of the .NET framework, contains XML data binding classes; includes xsd.exe tool to generate classes from XSD schema | ||
xmlbeansxx | C++ | Apache 2.0 | 0.9.1, April 1, 2008 | Unknown | Unknown | C++ port of Apache XMLBeans | |
XStream | Java | BSD-style license | January 1, 2004 | 1.4.10, May 23, 2017 | Unknown | Unknown | Also capable of serializing to JSON |
Zeus | Java | ? | 3.5 beta, August 16, 2002 | Unknown | Unknown |
In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.
XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats can be subsequently converted to formats such as PDF, PostScript, and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.
Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.
YAML is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.
JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of name–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.
XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.
An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.
Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.
CodeSynthesis XSD/e is a validating XML parser/serializer and C++ XML Data Binding generator for Mobile and Embedded systems. It is developed by Code Synthesis and dual-licensed under the GNU GPL and a proprietary license.
In computer science, marshalling or marshaling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission, especially between different runtimes. It is typically used when data must be moved between different parts of a computer program or from one program to another.
Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure.
XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.
This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:
MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages, some official libraries and others community created, such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.
gSOAP is a C and C++ software development toolkit for SOAP/XML web services and generic XML data bindings. Given a set of C/C++ type declarations, the compiler-based gSOAP tools generate serialization routines in source code for efficient XML serialization of the specified C and C++ data structures. Serialization takes zero-copy overhead.
The Quake Markup Language (QuakeML) is a flexible, extensible and modular XML representation of seismological data which is intended to cover a broad range of fields of application in modern seismology.
Castor is a data binding framework for Java with some features like Java to Java-to-XML binding, Java-to-SQL persistence, paths between Java objects, XML documents, relational tables, etc. Castor is one of the oldest data binding projects.