Simple XML

Last updated

Simple XML is a variation of XML containing only elements. All attributes are converted into elements. Not having attributes or other xml elements such as the XML declaration / DTDs allows the use of simple and fast parsers. This format is also compatible with mainstream XML parsers.

XML Markup language developed by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The W3C's XML 1.0 Specification and several other related specifications—all of them free open standards—define XML.

Contents

Structure

For example:

<Agenda><type>gardening</type><Activity><type>Watering</type><golf-course><time>6:00</time></golf-course><yard><time>7:00</time></yard></Activity><Activity><type>cooking</type><lunch><time>12:00</time></lunch></Activity></Agenda>

would represent:

<?xml version="1.0" encoding="UTF-8"?><Agendatype="gardening"><Activitytype="Watering"><golf-coursetime="6:00"/><yardtime="7:00"/></Activity><Activitytype="cooking"><lunchtime="12:00"/></Activity></Agenda>

Validation

Simple XML uses a simple XPath list for validation. The XML snippet above for example, would be represented by:

XPath is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

 /Agenda/type|(Activity/type|(*/time))

or a bit more human readable as:

 /Agenda/type  /Agenda/Activity/type  /Agenda/Activity/*/time

This allows the XML to be processed as a stream (without creating an object model in memory) with fast validation.

Related Research Articles

A document type definition (DTD) is a set of markup declarations that define a document type for a SGML-family markup language.

HTML Hypertext Markup Language

Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad of cornerstone technologies for the World Wide Web.

Standard Generalized Markup Language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 defines generalized markup:-

Generalized markup is based on two postulates:

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces (API)s, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

SAX is an event-driven online algorithm for parsing XML documents, with an API developed by the XML-DEV mailing list. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, i.e. building the full AST of an XML document for convenience of the user, SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making single pass through the input stream.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document. They can check if it adheres to the description of the element it is placed in.

YAML is a human-readable data serialization language. It is commonly used for configuration files, but could be used in many applications where data is being stored or transmitted. YAML targets many of the same communications applications as XML but has a minimal syntax which intentionally breaks compatibility with SGML . It uses both Python-style indentation to indicate nesting, and a more compact format that uses [] for lists and {} for maps making YAML 1.2 a superset of JSON.

In web development, "tag soup" is a pejorative term that refers to syntactically or structurally incorrect HTML written for a web page. Because web browsers have historically treated HTML syntax or structural errors leniently, there has been little pressure for web developers to follow published standards, and therefore there is a need for all browser implementations to provide mechanisms to cope with the appearance of "tag soup", accepting and correcting for invalid syntax and structure where possible.

OPML is an XML format for outlines. Originally developed by UserLand as a native file format for the outliner application in its Radio UserLand product, it has since been adopted for other uses, the most common being to exchange lists of web feeds between web feed aggregators.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

JSON text-based open standard designed for human-readable data interchange

In computing, JavaScript Object Notation (JSON) is an open-standard file format that uses human-readable text to transmit data objects consisting of attribute–value pairs and array data types. It is a very common data format used for asynchronous browser–server communication, including as a replacement for XML in some AJAX-style systems.

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

The term CDATA, meaning character data, is used for distinct, but related, purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

XMLBeans is a Java-to-XML binding framework which is part of the Apache Software Foundation XML project.

DDL is part of the MPEG-7 standard. It gives an important set of tools for the users to create their own Description Schemes (DSs) and Descriptors (Ds). DDL defines the syntax rules to define, combine, extend and modify Description Schemes and Descriptors.

Extensible Hypertext Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used Hypertext Markup Language (HTML), the language in which Web pages are formulated.

libSBML

LibSBML is an open-source software library that provides an application programming interface (API) for the SBML format. The libSBML library can be embedded in a software application or used in a web servlet as part of the application or servlet's implementation of support for reading, writing, and manipulating SBML documents and data streams. The core of libSBML is written in ISO standard C++; the library provides API for many programming languages via interfaces generated with the help of SWIG.

References

  1. http://www.w3.org/XML/simple-XML.html