Transaction Processing over XML

Last updated

Transaction Processing over XML (TPoX) is a computing benchmark for XML database systems. As a benchmark, TPoX is used for the performance testing of database management systems that are capable of storing, searching, modifying and retrieving XML data. The goal of TPoX is to allow database designers, developers and users to evaluate the performance of XML database features, such as the XML query languages XQuery and SQL/XML, XML storage, XML indexing, XML Schema support, XML updates, transaction processing and logging, and concurrency control. TPoX includes XML update tests based on the XQuery Update Facility.

The TPoX benchmark exercises the processing of data-centric XML, in contrast to content- or document-centric XML.

TPoX was originally developed and tested by IBM and Intel, but became an open source project on SourceForge in January 2007. TPoX 1.1 was released in June 2007. TPoX 2.0 was released in July 2009.

The TPoX benchmark package contains the following:

The TPoX workload consists of seven XML queries, two inserts, two deletes, and six XML update operations. The primary performance metric of the benchmark is TTPS (TPoX Transactions Per Second) which is the throughput of the multi-user read/write workload at a given scale factor. The smallest TPoX scale factor uses 10GB of raw XML documents, the largest uses 1PB of raw XML documents.

Related Research Articles

Database Organized collection of data in computing

In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.

PostgreSQL Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

IBM Db2 Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

pureXML is the native XML storage feature in the IBM Db2 data server. pureXML provides query languages, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that Db2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called Not only SQL to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

In transaction processing, the Telecommunication Application Transaction Processing Benchmark (TATP) is a benchmark designed to measure the performance of in-memory database transaction systems.

Versant Object Database (VOD) is an object database software product developed by Versant Corporation.

XQuery API for Java Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

BaseX

BaseX is a native and light-weight XML database management system and XQuery processor, developed as a community project on GitHub. It is specialized in storing, querying, and visualizing large XML documents and collections. BaseX is platform-independent and distributed under the BSD-3-Clause license.

The following is provided as an overview of and topical guide to databases:

Stylus Studio is an integrated development environment (IDE) for the Extensible Markup Language (XML). It consists of a variety of tools and visual designers to edit and transform XML documents and legacy data such as electronic data interchange (EDI), comma-separated values (CSV) and relational data.

Oracle NoSQL Database

Oracle NoSQL Database (ONDB) is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

Azure Cosmos DB is Microsoft's proprietary globally distributed, multi-model database service "for managing data at planet-scale" launched in May 2017. It is schema-agnostic, horizontally scalable, and generally classified as a NoSQL database.

TPC-C Benchmark used to compare the performance of OLTP systems

TPC-C, short for Transaction Processing Performance Council Benchmark C, is a benchmark used to compare the performance of online transaction processing (OLTP) systems. This industry standard was published in August 1992, and eventually replaced the earlier TPC-A, which was declared obsolete in 1995. It has undergone a number of changes to keep it relevant as computer performance grew by several orders of magnitude, with the current version as of 2021, 5.11, released in 2010. In 2006, a newer OLTP benchmark was added to the suite, TPC-E, but TPC-C remains in widespread use.

References