This article needs to be updated.(March 2015) |
An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.
There are a number of reasons to directly specify data in XML or other document formats such as JSON. For XML in particular, they include: [1] [2]
Steve O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". [4] [ needs update ] It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation.
XML-enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure:
RDBMS that support the ISO XML Type are:
Typically an XML-enabled database is best suited where the majority of data are non-XML. For datasets where the majority of data are XML, a native XML database is better suited.
selectid,vol,xmlquery('$j/name',passingjournalas"j")asnamefromjournalswherexmlexists('$j[licence="CreativeCommons"]',passingjournalas"j")
Native XML databases are especially tailored for working with XML data. As managing XML as large strings would be inefficient, and due to the hierarchical nature of XML, custom optimized data structures are used for storage and querying. This usually increases performance both in terms of read-only queries and updates. [11] XML nodes and documents are the fundamental unit of (logical) storage, just as a relational database has fields and rows.
The standard for querying XML data per W3C recommendation is XQuery; the latest version is XQuery 3.1. [12] XQuery includes XPath as a sub-language and XML itself is a valid sub-syntax of XQuery. In addition to XPath, some XML databases support XSLT as a method of transforming documents or query results retrieved from the database.
Name | License | Native Language | XQuery 3.1 | XQuery 3.0 | XQuery 1.0 | XQuery Update | XQuery Full Text | EXPath Extensions | EXQuery Extensions | XSLT 2.0 | XForms 1.1 | XProc 1.0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
BaseX | BSD | Java | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No |
eXist | GNU LGPL | Java | Partial | Partial | Yes | Proprietary | Proprietary | Yes | Yes | Yes | Yes | Yes |
MarkLogic Server | Commercial | C++ | No | Partial | Yes | Proprietary | Proprietary | No | No | Yes | Yes | No |
OpenText xDB | Commercial | Java | Partial | Partial | Yes | Yes | Yes | No | No | No | No | No |
Oracle Berkeley DB XML | Commercial | C/C++ | ||||||||||
Qizx | Commercial | Java | No | No | Yes | Yes | Yes | No | No | Yes | No | No |
Sedna | Apache License 2.0 | C/C++ |
Name | XQJ | XML:DB | RESTful | RESTXQ | WebDAV |
---|---|---|---|---|---|
BaseX | Yes | Yes | Yes | Yes | Yes |
eXist | Yes | Yes | Yes | Yes | Yes |
MarkLogic Server | Yes | No | Yes | Yes | Yes |
Qizx | No | No | Yes | No | No |
Sedna | Yes | Yes | No | No | No |
For data-centric XML datasets, the unique and distinct keyword search method, namely, XDMA [13] for XML databases is designed and developed based on dual indexing and mutual summation.
An object database or object-oriented database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. A third type, object–relational databases, is a hybrid of both approaches. Object databases have been considered since the early 1980s.
Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.
Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.
A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases can be uni-temporal, bi-temporal or tri-temporal.
A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.
ADO.NET is a data access technology from the Microsoft .NET Framework that provides communication between relational and non-relational systems through a common set of components. ADO.NET is a set of computer software components that programmers can use to access data and data services from a database. It is a part of the base class library that is included with the Microsoft .NET Framework. It is commonly used by programmers to access and modify data stored in relational database systems, though it can also access data in non-relational data sources. ADO.NET is sometimes considered an evolution of ActiveX Data Objects (ADO) technology, but was changed so extensively that it can be considered an entirely new product.
The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.
In relational databases, the information schema is an ANSI-standard set of read-only views that provide information about all of the tables, views, columns, and procedures in a database. It can be used as a source of the information that some databases make available through non-standard commands, such as:
=> SELECT count(table_name) FROM information_schema.tables; count ------- 99 => SELECT column_name, data_type, column_default, is_nullable FROM information_schema.columns WHERE table_name='alpha'; column_name | data_type | column_default | is_nullable -------------+-----------+----------------+------------- foo | integer | | YES bar | character | | YES => SELECT * FROM information_schema.information_schema_catalog_name; catalog_name -------------- johnd
SQL/XML or XML-Related Specifications is part 14 of the Structured Query Language (SQL) specification. In addition to the traditional predefined SQL data types like NUMERIC, CHAR, TIMESTAMP, ...
it introduces the predefined data type XML
together with constructors, several routines, functions, and XML-to-SQL data type mappings to support manipulation and storage of XML in a SQL database.
Transaction Processing over XML (TPoX) is a computing benchmark for XML database systems. As a benchmark, TPoX is used for the performance testing of database management systems that are capable of storing, searching, modifying and retrieving XML data. The goal of TPoX is to allow database designers, developers and users to evaluate the performance of XML database features, such as the XML query languages XQuery and SQL/XML, XML storage, XML indexing, XML Schema support, XML updates, transaction processing and logging, and concurrency control. TPoX includes XML update tests based on the XQuery Update Facility.
XQuery Update Facility is an extension to the XML Query language, XQuery. It provides expressions that can be used to make changes to instances of the XQuery 1.0 and XPath 2.0 Data Model.
SQL:1999 was the fourth revision of the SQL database query language. It introduced many new features, many of which required clarifications in the subsequent SQL:2003. In the meanwhile SQL:1999 is deprecated.
A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.
XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.
pureXML is the native XML storage feature in the IBM Db2 data server. pureXML provides query languages, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that Db2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format.
DatabaseSpy is a multi-database query, design, and database comparison tool from Altova, the creator of XMLSpy. DatabaseSpy connects to many major relational databases, facilitating SQL querying, database structure design, database content editing, and database comparison and conversion.
Sedna is an open-source database management system that provides native storage for XML data. The distinctive design decisions employed in Sedna are (i) schema-based clustering storage strategy for XML data and (ii) memory management based on layered address space.
XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.
SQL:2011 or ISO/IEC 9075:2011 is the seventh revision of the ISO (1987) and ANSI (1986) standard for the SQL database query language. It was formally adopted in December 2011. The standard consists of 9 parts which are described in detail in SQL. The next version is SQL:2016.
JSONiq is a query and functional programming language that is designed to declaratively query and transform collections of hierarchical and heterogeneous data in format of JSON, XML, as well as unstructured, textual data.