XML database

Last updated July 23, 2022

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

Rationale for XML in databases

There are a number of reasons to directly specify data in XML or other document formats such as JSON. For XML in particular, they include:^[1]^[2]

An enterprise may have a lot of XML in an existing standard format
Data may need to be exposed or ingested as XML, so using another format such as relational forces double-modeling of the data
XML is very well suited to sparse data, deeply nested data and mixed content (such as text with embedded markup tags)
XML is human readable whereas relational tables require expertise to access
Metadata is often available as XML
Semantic web data is available as RDF/XML
Provides a solution for Object-relational impedance mismatch ^[3]

Steve O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa".^[4]^{[ needs update ]} It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation.

XML enabled databases

XML enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure:

XML is stored into a CLOB (Character large object)
XML is `shredded` into a series of Tables based on a Schema^[5]
XML is stored into a native XML Type as defined by ISO Standard 9075-14 ^[6]

RDBMS that support the ISO XML Type are:

IBM DB2 (pureXML^[7])
Microsoft SQL Server^[8]
Oracle Database^[9]
PostgreSQL^[10]

Typically an XML enabled database is best suited where the majority of data are non-XML. For datasets where the majority of data are XML, a native XML database is better suited.

Example of XML Type Query in IBM DB2 SQL

selectid,vol,xmlquery('$j/name',passingjournalas"j")asnamefromjournalswherexmlexists('$j[licence="CreativeCommons"]',passingjournalas"j")

Native XML databases

Native XML databases are especially tailored for working with XML data. As managing XML as large strings would be inefficient, and due to the hierarchical nature of XML, custom optimized data structures are used for storage and querying. This usually increases performance both in terms of read-only queries and updates.^[11] XML nodes and documents are the fundamental unit of (logical) storage, just as a relational database has fields and rows.

The standard for querying XML data per W3C recommendation is XQuery; the latest version is XQuery 3.1.^[12] XQuery includes XPath as a sub-language and XML itself is a valid sub-syntax of XQuery. In addition to XPath, some XML databases support XSLT as a method of transforming documents or query results retrieved from the database.

Language features

Name	License	Native Language	XQuery 3.1	XQuery 3.0	XQuery 1.0	XQuery Update	XQuery Full Text	EXPath Extensions	EXQuery Extensions	XSLT 2.0	XForms 1.1	XProc 1.0
BaseX	BSD	Java	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No
eXist	GNU LGPL	Java	Partial	Partial	Yes	Proprietary	Proprietary	Yes	Yes	Yes	Yes	Yes
MarkLogic Server	Commercial	C++	No	Partial	Yes	Proprietary	Proprietary	No	No	Yes	Yes	No
OpenText xDB	Commercial	Java	Partial	Partial	Yes	Yes	Yes	No	No	No	No	No
Oracle Berkeley DB XML	Commercial
Qizx	Commercial	Java	No	No	Yes	Yes	Yes	No	No	Yes	No	No
Sedna	Apache License 2.0

Supported APIs

Name	XQJ	XML:DB	RESTful	RESTXQ	WebDAV
BaseX	Yes	Yes	Yes	Yes	Yes
eXist	Yes	Yes	Yes	Yes	Yes
MarkLogic Server	Yes	No	Yes	Yes	Yes
Qizx	No	No	Yes	No	No
Sedna	Yes	Yes	No	No	No

Data-centric XML datasets

For data-centric XML datasets, the unique and distinct keyword search method, namely, XDMA^[13] for XML databases is designed and developed based on dual indexing and mutual summation.

Related Research Articles

An object database or object-oriented database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. Object–relational databases are a hybrid of both approaches.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

Query languages, data query languages or database query languages (DQLs) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL).

A physical data model is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, though it may be reverse-engineered from a given database implementation. A complete physical data model will include all the database artifacts required to create relationships between tables or to achieve performance goals, such as indexes, constraint definitions, linking tables, partitioned tables or clusters. Analysts can usually use a physical data model to calculate storage estimates; it may include specific storage allocation details for a given database system.

A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases could be uni-temporal, bi-temporal or tri-temporal.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

In relational databases, the information schema is an ANSI-standard set of read-only views that provide information about all of the tables, views, columns, and procedures in a database. It can be used as a source of the information that some databases make available through non-standard commands, such as:

 => SELECT count(table_name) FROM information_schema.tables;  count   -------  99    => SELECT column_name, data_type, column_default, is_nullable  FROM information_schema.columns WHERE table_name='alpha';  column_name | data_type | column_default | is_nullable   -------------+-----------+----------------+-------------  foo | integer | | YES  bar | character | | YES    => SELECT * FROM information_schema.information_schema_catalog_name;  catalog_name   --------------  johnd

SQL/XML or XML-Related Specifications is part 14 of the Structured Query Language (SQL) specification. In addition to the traditional predefined SQL data types like NUMERIC, CHAR, TIMESTAMP, ... it introduces the predefined data type XML together with constructors, several routines, functions, and XML-to-SQL data type mappings to support manipulation and storage of XML in a SQL database.

XQuery Update Facility is an extension to the XML Query language, XQuery. It provides expressions that can be used to make changes to instances of the XQuery 1.0 and XPath 2.0 Data Model.

SQL:1999 was the fourth revision of the SQL database query language. It introduced many new features, many of which required clarifications in the subsequent SQL:2003. In the meanwhile SQL:1999 is deprecated.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

pureXML is the native XML storage feature in the IBM Db2 data server. pureXML provides query languages, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that Db2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

DatabaseSpy SQL database profiling tool and GUI

DatabaseSpy is a multi-database query, design, and database comparison tool from Altova, the creator of XMLSpy. DatabaseSpy connects to many major relational databases, facilitating SQL querying, database structure design, database content editing, and database comparison and conversion.

Sedna is an open-source database management system that provides native storage for XML data. The distinctive design decisions employed in Sedna are (i) schema-based clustering storage strategy for XML data and (ii) memory management based on layered address space.

XQuery API for Java Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

SQL:2011 or ISO/IEC 9075:2011 is the seventh revision of the ISO (1987) and ANSI (1986) standard for the SQL database query language. It was formally adopted in December 2011. The standard consists of 9 parts which are described in detail in SQL. The next version is SQL:2016.

JSONiq is a query and functional programming language that is designed to declaratively query and transform collections of hierarchical and heterogeneous data in format of JSON, XML, as well as unstructured, textual data.

References

↑ Nicola, Matthias (28 September 2010). "5 Reasons for Storing XML in a Database". Native XML Database. Retrieved 17 March 2015.
↑ Feldman, Damon (11 April 2013). Moving from Relational Modeling to XML and MarkLogic Data Models. MarkLogic World . Retrieved 17 March 2015.
↑ [NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley Educational Publishers Inc, 2009] ISBN 978-0321826626
↑ O'Connell, Steve (2005). Section 9.2. Advanced Databases Course Notes (Syllabus). Southampton, England: University of Southampton.
↑ "XML Schema Storage and Query: Basic". Oracle XML DB Developer's Guide, 10g Release 2. Oracle Corporation. August 2005. Retrieved 17 March 2015.. Section Creating XMLType Tables and Columns Based on XML Schema
↑ "ISO/IEC 9075-14:2011: Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications (SQL/XML)". International Organization for Standardization. 2011. Retrieved 17 March 2015.
↑ "pureXML overview -- DB2 as an XML database". IBM Knowledge Center. IBM . Retrieved 17 March 2015.
↑ "Using XML in SQL Server". Microsoft Developer Network. Microsoft Corporation . Retrieved 17 March 2015.
↑ "XMLType Operations". Oracle XML DB Developer's Guide, 10g Release 2. Oracle Corporation. August 2005. Retrieved 17 March 2015.
↑ "8.13. XML Type". PostgreSQL 9.6 Documentation. Retrieved 1 April 2017.
↑ Matthias, Nicola (22 August 2010). "XML versus Relational Database Performance". Native XML Database. Retrieved 28 Jun 2017.
↑ "XQuery 3.1 Recommendation". 2017-03-21.
↑ Selvaganesan, S.; Haw, Su-Cheng; Soon, Lay-Ki (2014). "XDMA: A Dual Indexing and Mutual Summation Based Keyword Search Algorithm for XML Databases". International Journal of Software Engineering and Knowledge Engineering. 24 (4): 591–615. doi:10.1142/s0218194014500223.

External links

Ranking of Native XML DBMS by popularity, updated monthly, from DB-Engines
XML Databases - The Business Case, Charles Foster, June 2008 - Talks about the current state of Databases and data persistence, how the current Relational Database model is starting to crack at the seams and gives an insight into a strong alternative for today's requirements.
An XML-based Database of Molecular Pathways (2005-06-02) Speed / Performance comparisons of eXist, X-Hive, Sedna and Qizx/open
XML Native Database Systems: Review of Sedna, Ozone, NeoCoreXMS 2006
XML Data Stores: Emerging Practices
Bhargava, P.; Rajamani, H.; Thaker, S.; Agarwal, A. (2005) XML Enabled Relational Databases, Texas, The University of Texas at Austin.
Initiative for XML Databases
XML and Databases, Ronald Bourret, September 2005
The State of Native XML Databases, Elliotte Rusty Harold, August 13, 2007

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[nicola2010-1] Nicola, Matthias (28 September 2010). "5 Reasons for Storing XML in a Database". Native XML Database. Retrieved 17 March 2015.

[feldman2013-2] Feldman, Damon (11 April 2013). Moving from Relational Modeling to XML and MarkLogic Data Models. MarkLogic World . Retrieved 17 March 2015.

[3] [NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley Educational Publishers Inc, 2009] ISBN 978-0321826626

[oconnell2005-4] O'Connell, Steve (2005). Section 9.2. Advanced Databases Course Notes (Syllabus). Southampton, England: University of Southampton.

[oracle-5] "XML Schema Storage and Query: Basic". Oracle XML DB Developer's Guide, 10g Release 2. Oracle Corporation. August 2005. Retrieved 17 March 2015.. Section Creating XMLType Tables and Columns Based on XML Schema

[iso9075-2011-6] "ISO/IEC 9075-14:2011: Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications (SQL/XML)". International Organization for Standardization. 2011. Retrieved 17 March 2015.

[db2purexml-7] "pureXML overview -- DB2 as an XML database". IBM Knowledge Center. IBM . Retrieved 17 March 2015.

[sqlserver2005-8] "Using XML in SQL Server". Microsoft Developer Network. Microsoft Corporation . Retrieved 17 March 2015.

[oracle2-9] "XMLType Operations". Oracle XML DB Developer's Guide, 10g Release 2. Oracle Corporation. August 2005. Retrieved 17 March 2015.

[postgresql-10] "8.13. XML Type". PostgreSQL 9.6 Documentation. Retrieved 1 April 2017.

[11] Matthias, Nicola (22 August 2010). "XML versus Relational Database Performance". Native XML Database. Retrieved 28 Jun 2017.

[12] "XQuery 3.1 Recommendation". 2017-03-21.

[13] Selvaganesan, S.; Haw, Su-Cheng; Soon, Lay-Ki (2014). "XDMA: A Dual Indexing and Mutual Summation Based Keyword Search Algorithm for XML Databases". International Journal of Software Engineering and Knowledge Engineering. 24 (4): 591–615. doi:10.1142/s0218194014500223.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

v t e Database models
Common models	Flat Hierarchical Dimensional Network Relational Entity–relationship Enhanced Graph Object-oriented Entity–attribute–value
Other models	Correlational Multidimensional Array Semantic Star schema XML database
Implementations	Flat file Column-oriented Document-oriented Object–relational Deductive Temporal XML data store Key–value store Triplestore

v t e Database management systems
Types	Object-oriented comparison Relational list comparison Key–value Column-oriented list Document-oriented Wide-column store Graph NoSQL NewSQL In-memory list Multi-model comparison Cloud
Concepts	Database ACID Armstrong's axioms Codd's 12 rules CAP theorem CRUD Null Candidate key Foreign key Superkey Surrogate key Unique key
Objects	Relation table column row View Transaction Transaction log Trigger Index Stored procedure Cursor Partition
Components	Concurrency control Data dictionary JDBC XQJ ODBC Query language Query optimizer Query rewriting system Query plan
Functions	Administration Query optimization Replication Sharding
Related topics	Database models Database normalization Database storage Distributed database Federated database system Referential integrity Relational algebra Relational calculus Relational model Object–relational database Transaction processing
Category Outline WikiProject