XML Information Set

Last updated

XML Information Set (XML Infoset) is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. [1] The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.

Contents

An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.

An information set can contain up to eleven different types of information items:

  1. The Document Information Item (always present)
  2. Element Information Items
  3. Attribute Information Items
  4. Processing Instruction Information Items
  5. Unexpanded Entity Reference Information Items
  6. Character Information Items
  7. Comment Information Items
  8. The Document Type Declaration Information Item
  9. Unparsed Entity Information Items
  10. Notation Information Items
  11. Namespace Information Items

XML was initially developed without a formal definition of its infoset. This was only formalised by later work beginning in 1999, first published as a separate W3C Working Draft at the end of December that year. [2] Infoset recommendation Second Edition was adopted on 4 February, 2004. [3] If a 2.0 version of the XML standard is ever published, it is likely that this would absorb the Infoset recommendation as an integral part of that standard.

Infoset augmentation

Infoset augmentation or infoset modification refers to the process of modifying the infoset during schema validation, for example by adding default attributes. The augmented infoset is called the post-schema-validation infoset, or PSVI. [4]

Infoset augmentation is somewhat controversial, with claims that it is a violation of modularity and tends to cause interoperability problems, since applications get different information depending on whether or not validation has been performed. [5]

Infoset augmentation is supported by XML Schema but not RELAX NG.

Serialization

Typically, XML Information Set is serialized as XML. [6] There are also serialization formats for Binary XML, CSV, [7] and JSON. [8]

See also

XML Information Set instances:

Related Research Articles

A document type definition (DTD) is a set of markup declarations that define a document type for an SGML-family markup language.

SOAP Messaging protocol for web services

SOAP is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks. It uses XML Information Set for its message format, and relies on application layer protocols, most often Hypertext Transfer Protocol (HTTP), although some legacy systems communicate over Simple Mail Transfer Protocol (SMTP), for message negotiation and transmission.

XML Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a data model for metadata. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

Geography Markup Language Used to describe geographical features

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. Key to GML's utility is its ability to integrate all forms of geographic information, including not only conventional "vector" or discrete objects, but coverages and sensor data.

In computing, RELAX NG is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.

Document Schema Definition Languages (DSDL) is a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology.

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath.

XML Signature defines an XML syntax for digital signatures and is defined in the W3C recommendation XML Signature Syntax and Processing. Functionally, it has much in common with PKCS #7 but is more extensible and geared towards signing XML documents. It is used by various Web technologies such as SOAP, SAML, and others.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

XBRL Exchange format for business information

XBRL is a freely available and global framework for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. The language is XML-based and uses the XML syntax and related XML technologies such as XML Schema, XLink, XPath, and Namespaces. One use of XBRL is to define and exchange financial information, such as a financial statement. The XBRL Specification is developed and published by XBRL International, Inc. (XII).

The following tables compare XML compatibility and support for a number of layout engines.

The National Information Exchange Model (NIEM) is an XML-based information exchange framework from the United States. NIEM represents a collaborative partnership of agencies and organizations across all levels of government and with private industry. The purpose of this partnership is to effectively and efficiently share critical information at key decision points throughout the whole of the justice, public safety, emergency and disaster management, intelligence, and homeland security enterprise. NIEM is designed to develop, disseminate, and support enterprise-wide information exchange standards and processes that will enable jurisdictions to automate information sharing.

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

Efficient XML Interchange (EXI) is a binary XML format for exchange of data on a computer network. It was developed by the W3C's Efficient Extensible Interchange Working Group and is one of the most prominent efforts to encode XML documents in a binary data format, rather than plain text. Using EXI format reduces the verbosity of XML documents as well as the cost of parsing. Improvements in the performance of writing (generating) content depends on the speed of the medium being written to, the methods and quality of actual implementations. EXI is useful for

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

This is a comparison of data-serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

ShEx

Shape Expressions (ShEx) is a data modelling language for validating and describing a Resource Description Framework (RDF).

References

  1. W3C XML Infoset
  2. "XML Information Set" (Working Draft ed.). W3C. 20 December 1999.
  3. "XML Information Set" (Second ed.). W3C. 4 February 2004.
  4. XML Schema 1.1 Part 1: Structures
  5. RELAX NG and W3C XML Schema Archived September 27, 2007, at the Wayback Machine , James Clark, 4 Jun 2002
  6. "Extensible Markup Language (XML)". W3C. Retrieved 9 October 2014.
  7. XmlCsvReader Implementation
  8. Apache CXF JSON Support