Schematron

Last updated

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath languages. In many implementations, the Schematron XML is processed into XSLT code for deployment anywhere that XSLT can be used.

Contents

Schematron is capable of expressing constraints in ways that other XML schema languages like XML Schema and DTD cannot. For example, it can require that the content of an element be controlled by one of its siblings. Or it can request or require that the root element, regardless of what element that is, must have specific attributes. Schematron can also specify required relationships between multiple XML files. Constraints and content rules may be associated with "plain-English" (or any language) validation error messages, allowing translation of numeric Schematron error codes into meaningful user error messages. Users of Schematron define all the error messages themselves. [1]

The current ISO recommendation is Information technology, Document Schema Definition Languages (DSDL), Part 3: Rule-based validation, Schematron (ISO/IEC 19757-3:2020).

Uses

Constraints are specified in Schematron using an XPath-based language that can be deployed as XSLT code, making it practical for applications such as the following:

Adjunct to Structural Validation
By testing for co-occurrence constraints, non-regular constraints, and inter-document constraints, Schematron can extend the validations that can be expressed in languages such as DTDs, RELAX NG or XML Schema. [2]
Lightweight Business Rules Engine
Schematron is not a comprehensive, Rete rules engine, but it can be used to express rules about complex structures with an XML document.
XML Editor Syntax Highlighting Rules
Some XML editors use Schematron rules to conditionally highlight XML files for errors. Not all XML editors support Schematron.

Versions

Schematron was invented by Rick Jelliffe while at Academia Sinica Computing Centre, Taiwan. He described Schematron as "a feather duster to reach the parts other schema languages cannot reach".

The most common versions of Schematron are:

Schematron as an ISO Standard

Schematron has been standardized by the ISO as Information technology, Document Schema Definition Languages (DSDL), Part 3: Rule-based validation, Schematron (ISO/IEC 19757-3:2020).

This standard is currently not listed on the ISO Publicly Available Specifications list. Paper versions may be purchased from ISO or national standards bodies.

Schemas that use ISO/IEC FDIS 19757-3 should use the following namespace:

http://purl.oclc.org/dsdl/schematron 

Sample rule

Schematron rules can be created using a standard XML editor or XForms application. The following is a sample schema:

<schemaxmlns="http://purl.oclc.org/dsdl/schematron"><pattern><title>Daterules</title><rulecontext="Contract"><asserttest="ContractDate > current-date()">ContractDateshouldbe inthepastbecausefuturecontractsarenotallowed.</assert></rule></pattern></schema>

This rule checks to make sure that the ContractDate XML element has a date that is before the current date. If this rule fails the validation will fail and an error message which is the body of the assert element will be returned to the user.

Implementation

Schematron schemas are suitable for use in XML Pipelines, thereby allowing workflow process designers to build and maintain rules using XML manipulation tools. The W3C's XProc pipelining language, for example, has native support for Schematron schema processing through its "validate-with-schematron" step. [4]

Since Schematron schemas can be transformed into XSLT stylesheets, these can themselves be used in XML Pipelines which support XSLT transformation. An Apache Ant task can be used to convert Schematron rules into XSLT files.

There exists also native Schematron implementation, like the Java implementation from Innovimax/INRIA, QuiXSchematron, that also do streaming.

See also

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

In software, an XML pipeline is formed when XML processes, especially XML transformations and XML validations, are connected.

REgular LAnguage description for XML (RELAX) is a specification for describing XML-based languages. A description written in RELAX is called a RELAX grammar.

In computing, RELAX NG is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.

Document Schema Definition Languages (DSDL) is a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

<span class="mw-page-title-main">Rick Jelliffe</span>

Richard (Rick) Alan Jelliffe is an Australian programmer and standards activist, particularly associated with web standards, markup languages, internationalization and schema languages. He is the founder and Chief Technical Officer of Topologi Pty. Ltd, an XML tools vendor in Sydney. He has a degree in economics from the University of Sydney.

<span class="mw-page-title-main">Oxygen XML Editor</span>

The Oxygen XML Editor is a multi-platform XML editor, XSLT/XQuery debugger and profiler with Unicode support. It is a Java application so it can run in Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.

XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.

Namespace-based Validation Dispatching Language (NVDL) is an XML schema language for validating XML documents that integrate with multiple namespaces. It is an ISO/IEC standard, and it is Part 4 of the DSDL schema specification. Much of the work on NVDL is based on the older Namespace Routing Language.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

Content Assembly Mechanism (CAM) is an XML-based standard for creating and managing information exchanges that are interoperable and deterministic descriptions of machine-processable information content flows into and out of XML structures. CAM is a product of the OASIS Content Assembly Technical Committee.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

<span class="mw-page-title-main">XML transformation language</span> Type of programming language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

References

  1. Siegel, Erik (2022). Schematron: a language for validating XML. Denver, CO: XML Press. ISBN   978-1-937434-81-6.
  2. Fennell, Philip (June 2014). "Schematron - More useful than you'd thought". XML London 2014: 103–112. doi: 10.14337/XMLLondon14.Fennell01 . ISBN   978-0-9926471-1-7.
  3. Part 3: Rule-based validation — Schematron (ISO/IEC 19757-3:2006) (zip), Information technology — Document Schema Definition Languages (DSDL), ISO/IEC, 2006-06-01, retrieved 2014-06-15
  4. "7.2.5 p:validate-with-schematron". XProc: An XML Pipeline Language. World Wide Web Consortium. 2010-05-11. Retrieved 2012-11-12.