Each XML document has exactly one single root element. It encloses all the other elements and is, therefore, the sole parent element to all the other elements. ROOT elements are also called document elements. In HTML, the root element is the <html>
element. [1]
The World Wide Web Consortium defines not only the specifications for XML itself, [2] but also the DOM, which is a platform- and language-independent standard object model for representing XML documents. DOM Level 1 defines, for every XML document, an object representation of the document
itself and an attribute or property on the document called documentElement
. This property provides access to an object of type element
which directly represents the root element of the document. [3]
<parent><child>content</child><childattribute="att"/></parent>
There can be other XML nodes outside of the root element. [4] In particular, the root element may be preceded by a prolog, which itself may consist of an XML declaration, optional comments, processing instructions and whitespace, followed by an optional DOCTYPE declaration and more optional comments, processing instructions and whitespace. After the root element, there may be further optional comments, processing instructions and whitespace within the document. [5]
Within the root element, apart from any number of attributes and other elements, there may also be more optional text, comments, processing instructions and whitespace.
A more expanded example of an XML document follows, demonstrating some of these extra nodes along with a single rootElement
element.
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE example [<!ENTITY copy "©">]> <rootElementattribute="xyz"><contentElement/></rootElement><!-- comment nodes may appear almost anywhere -->
A document type definition (DTD) is a set of markup declarations that define a document type for an SGML-family markup language.
The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an XML or HTML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow programmatic access to the tree; with them one can change the structure, style or content of a document. Nodes can have event handlers attached to them. Once an event is triggered, the event handlers get executed.
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.
The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.
In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:
SAX is an event-driven online algorithm for parsing XML documents, with an API developed by the XML-DEV mailing list. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole—building the full abstract syntax tree of an XML document for convenience of the user—SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making a single pass through the input stream.
XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.
An HTML element is a type of HTML document component, one of several types of HTML nodes. HTML document is composed of a tree of simple HTML nodes, such as text nodes, and HTML elements, which add semantics and formatting to parts of document. Each element can have HTML attributes specified. Elements can also have content, including other elements and text.
YAML is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax which intentionally differs from SGML. It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...]
for lists and {...}
for maps thus JSON files are valid YAML 1.2.
A node is a basic unit of a data structure, such as a linked list or tree data structure. Nodes contain data and also may link to other nodes. Links between nodes are often implemented by pointers. It is a computer connected to the internet that participates in the peer to peer network.
In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.
XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.
The following tables compare Document Object Model (DOM) compatibility and support for a number of JavaScript engines used in web browsers.
In computing, quirks mode is a technique used by some web browsers for the sake of maintaining backward compatibility with web pages designed for old web browsers instead of strictly complying with W3C and IETF standards in standards mode.
XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.
In computing, a polyglot markup is a document or script written in a valid form of multiple markup languages, which performs the same output, independent of the markup's parser, layout engine, or interpreter. In general, the polyglot markup is a common subset of two or more languages, that can be used as a robust or simplified profile.
A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.