XML tree

Last updated

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

Contents

XML documents must contain a root element (one that is the parent of all other elements). All elements in an XML document can contain sub elements, text and attributes. The tree represented by an XML document starts at the root element and branches to the lowest level of elements. Although there is no consensus on the terminology used on XML Trees, at least two standard terminologies have been released by the W3C:

XPath defines a syntax named XPath expressions that identifies one or more internal components (elements, attributes, etc.) of an XML document. XPath is widely used to accesses XML-structured data.

The XML Information Set, or XML infoset, describes an abstract data model for XML documents in terms of information items. It is often used in the specifications of XML languages, for its convenience in describing constraints on constructs those languages allow.

Representation as trees

In mathematics, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. Any connected graph without simple cycles is a tree. A tree data structure simulates a hierarchical tree structure with a set of linked nodes. A hierarchy consists of an order defined on a set. The term hierarchy is used to stress a hierarchical relation among the elements.

The XML specification defines an XML document as a well-formed text if it satisfies a list of syntax rules defined in the specification. This specification is long, however 2 key points relating to the tree structure of an XML document are:

These features resemble those of trees, in that there is a single root node, and an order to the elements. XML has appeared as a first-class data type in other languages. The JavaScript (E4X) extension explicitly defines two specific objects (XML and XMLList), which support XML document nodes and XML node lists as distinct objects and use a dot-notation specifying parent-child relationships. [1] These data structures represent XML documents as a tree structure.

An XML Tree represented graphically can be as simple as an ASCII chart or a more graphically complex hierarchy. For instance, the XML document and the ASCII tree have the same structure. XML Trees do not show the content in an Instance document, only the structure of the document. In this example Product is the Root Element of the tree and the two child nodes of Product are Name and Details. Details contains two child nodes, Description and Price. The tree command in Windows and *nix also produce a similar tree structure and path.

Product ├───Name └───Details     └───Description     └───Price 
<Product><Name>Widget</Name><Details><Description>ThisWidgetisthehighestqualitywidget.</Description><Price>5.50</Price></Details></Product>

XPath Data Model

XPath, the XML Path Language, is a query language for selecting nodes from an XML document. XPath defines a syntax named XPath expressions that can query an XML document for one or more internal components (elements, attributes, etc.). XPath is widely used in other core-XML specifications and in programming libraries for accessing XML-encoded data. [2]

XPath Data Model terminology

The XPath Data Model is a long specification, and goes into many features unrelated to XML trees. Listed below are key terms from that specification and the XML specification. [3] [4]

Instance
The data model represented as a sequence.
Instance document
A document using and conforming to the same sequence/XML tree.
Sequence
An order collection of zero or more items. A sequence cannot be a member of a sequence. A single item appearing individually is modeled as a sequence containing one item.
Element
A node within the sequence that may contain
Node
Any item represented in the XML tree/sequence.
Root Node
The topmost element of the tree. All other elements and nodes must be contained within the root node.
Item
A node or an atomic value.
Value space
The part of an item that contains data rather than additional elements.
Atomic type
A primitive simple type or a type derived by restriction from another atomic type.
Atomic value
A value contained in the value space that is an atomic type.
QName
The qualified name of an element. It must conform to naming rules of XML objects. (i.e. must start with a letter or underscore, case-sensitive, cannot start with the letters xml(in any case), can contain letters, digits, hyphens, underscores, and periods, cannot contain spaces.)
Expanded-QName
The fully qualified name of an element. It may include a prefix and namespace. It must include the local name of the element.

Within a given tree, document order satisfies the following constraints: [5]

XML Information Set

XML Information Set (XML Infoset) describes an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document. The infoset makes it convenient to describe constraints on the XML constructs other XML languages allow. An XML document has an information set if it is well-formed and satisfies the namespace constraints. An information set can contain up to eleven different types of information items:

XML Information Set terminology

The XML Information Set is a long specification, and goes into many features unrelated to XML trees. Listed below are the most important terms relating to XML tree terminology:

"There is exactly one document information item in the information set, and all other information items are accessible from the properties of the document information item, either directly or indirectly through the properties of other information items. The document information item has the following properties:

There is an element information item for each element appearing in the XML document. One of the element information items is the value of the [document element] property of the document information item, corresponding to the root of the element tree, and all other element information items are accessible by recursively following its [children] property. An element information item has the following properties:

There is an attribute information item for each attribute (specified or defaulted) of each element in the document, including namespace declarations. The latter however appear as members of an element's [namespace attributes] property rather than its [attributes] property. Attributes declared in the DTD with no default value and not specified in the element's start tag are not represented by attribute information items. An attribute information item has the following properties:

Notes

  1. "Processing XML with E4X". Mozilla Developer Center. Mozilla Foundation.
  2. XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), 14 December 2010, http://www.w3.org/TR/xpath-datamodel/
  3. XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), 14 December 2010, http://www.w3.org/TR/xpath-datamodel/
  4. Extensible Markup Language (XML) 1.0 (Fifth Edition), 26 November 2008, retrieved: 24 July 2018, https://www.w3.org/TR/xml/REC-xml-20081126-review.html#sec-terminology
  5. XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), 14 December 2010, http://www.w3.org/TR/xpath-datamodel/
  6. XML Information Set (Second Edition), February 4 2004, http://www.w3.org/TR/xml-infoset/
  7. XML Information Set (Second Edition), February 4 2004, http://www.w3.org/TR/xml-infoset/

Related Research Articles

<span class="mw-page-title-main">Document Object Model</span> Convention for representing and interacting with objects in HTML, XHTML, and XML documents

The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow programmatic access to the tree; with them one can change the structure, style or content of a document. Nodes can have event handlers attached to them. Once an event is triggered, the event handlers get executed.

<span class="mw-page-title-main">SOAP</span> Messaging protocol for web services

SOAP is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks. It uses XML Information Set for its message format, and relies on application layer protocols, most often Hypertext Transfer Protocol (HTTP), although some legacy systems communicate over Simple Mail Transfer Protocol (SMTP), for message negotiation and transmission.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

SAX is an event-driven online algorithm for lexing and parsing XML documents, with an API developed by the XML-DEV mailing list. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole—building the full abstract syntax tree of an XML document for convenience of the user—SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making a single pass through the input stream.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

XML Signature defines an XML syntax for digital signatures and is defined in the W3C recommendation XML Signature Syntax and Processing. Functionally, it has much in common with PKCS #7 but is more extensible and geared towards signing XML documents. It is used by various Web technologies such as SOAP, SAML, and others.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.

The identity transform is a data transformation that copies the source data into the destination data without change.

<span class="mw-page-title-main">XQuery and XPath Data Model</span>

The XQuery and XPath Data Model (XDM) is the data model shared by the XPath 2.0, XSLT 2.0, XQuery, and XForms programming languages. It is defined in a W3C recommendation. Originally, it was based on the XPath 1.0 data model which in turn is based on the XML Information Set.

SXML is an alternative syntax for writing XML data as S-expressions, to facilitate working with XML data in Lisp and Scheme. An associated suite of tools implements XPath, SAX and XSLT for SXML in Scheme and are available in the GNU Guile implementation of that language.

The Internationalization Tag Set (ITS) is a set of attributes and elements designed to provide internationalization and localization support in XML documents.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

<span class="mw-page-title-main">XQuery API for Java</span> Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

Zorba is an open source query processor written in C++, implementing