XML catalog

Last updated

XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs.

However absolute URLs only work when the network can reach them. Relying on remote resources makes XML processing susceptible to both planned and unplanned network downtime.

Relative URLs are only useful in the context where they were initially created. For example, the URL "../../xml/dtd/docbookx.xml" will usually only be useful in very limited circumstances.

One way to avoid these problems is to use an entity resolver (a standard part of SAX) or a URI Resolver (a standard part of JAXP). A resolver can examine the URIs of the resources being requested and determine how best to satisfy those requests. The XML catalog is a document describing a mapping between external entity references and locally cached equivalents. [1]

Example Catalog.xml

The following simple catalog shows how one might provide locally cached DTDs for an XHTML page validation tool, for example.

<?xml version="1.0"?><!DOCTYPE catalog    PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"           "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"><catalogxmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"prefer="public"><publicpublicId="-//W3C//DTD XHTML 1.0 Strict//EN"uri="dtd/xhtml1/xhtml1-strict.dtd"/><publicpublicId="-//W3C//DTD XHTML 1.0 Transitional//EN"uri="dtd/xhtml1/xhtml1-transitional.dtd"/><publicpublicId="-//W3C//DTD XHTML 1.1//EN"uri="dtd/xhtml11/xhtml11-flat.dtd"/></catalog>

This catalog makes it possible to resolve -//W3C//DTD XHTML 1.0 Strict//EN to the local URI dtd/xhtml1/xhtml1-strict.dtd. Similarly, it provides local URIs for two other public IDs.

Note that the document above includes a DOCTYPE – this may cause the parser to attempt to access the system ID URL for the DOCTYPE (i.e. http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd ) before the catalog resolver is fully functioning, which is probably undesirable. To prevent this, simply remove the DOCTYPE declaration.

The following example shows this, and also shows the equivalent <system/> declarations as an alternative to <public/> declarations.

<?xml version="1.0"?><catalogxmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"><systemsystemId="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"uri="dtd/xhtml1/xhtml1-strict.dtd"/><systemsystemId="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"uri="dtd/xhtml1/xhtml1-transitional.dtd"/><systemsystemId="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"uri="dtd/xhtml11/xhtml11-flat.dtd"/></catalog>

<Rdtwy6zgo6tyj42yrefis:open is:pr author:2223113758 archived:false ></ref>İĂ==Using a catalog – Java SAX example== Catalog resolvers are available for various programming languages. The following example shows how, in Java, a SAX parser may be created to parse some input source in which the org.apache.xml.resolver.tools.CatalogResolver is used to resolve external entities to locally cached instances. This resolver originates from Apache Xerces but is now included with the Sun Java runtime.

It is necessary to create a SAXParser in the standard way by using factories. The XML reader entity resolver should be set to the default or to a customly-made one.

finalSAXParsersaxParser=SAXParserFactory.newInstance().newSAXParser();finalXMLReaderreader=saxParser.getXMLReader();finalContentHandlerhandler=...;finalInputSourceinput=...;reader.setEntityResolver(newCatalogResolver());reader.setContentHandler(handler);reader.parse(input);

It is important to call the parse method on the reader, not on the SAX parser.

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

XHTML Basic is an XML-based structured markup language primarily designed for simple user agents, often found in mobile devices such as mobile phones, PDAs, pagers, and set-top boxes..

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

Apache Wicket, commonly referred to as Wicket, is a component-based web application framework for the Java programming language conceptually similar to JavaServer Faces and Tapestry. It was originally written by Jonathan Locke in April 2004. Version 1.0 was released in June 2005. It graduated into an Apache top-level project in June 2007.

In computing, quirks mode is an approach used by web browsers to maintain backward compatibility with web pages designed for old web browsers, instead of strictly complying with web standards in standards mode. This behavior has since been codified, so what was previously standards mode is now referred to as simply no quirks mode.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

Semantic Interpretation for Speech Recognition (SISR) defines the syntax and semantics of annotations to grammar rules in the Speech Recognition Grammar Specification (SRGS). Since 5 April 2007, it is a World Wide Web Consortium recommendation.

Haml is a templating system that is designed to avoid writing inline code in a web document and make the HTML cleaner. Haml gives you the flexibility to have some dynamic content in HTML. Similar to other template systems like eRuby, Haml also embeds some code that gets executed during runtime and generates HTML code in order to provide some dynamic content. In order to run Haml code, files need to have a .haml extension. These files are similar to .erb or .eRuby files, which also help embed Ruby code while developing a web application.

A Formal Public Identifier (FPI) is a short piece of text with a particular structure that may be used to uniquely identify a product, specification or document. FPIs were introduced as part of Standard Generalized Markup Language (SGML), and serve particular purposes in formats historically derived from SGML. Some of their most common uses are as part of document type declarations (DOCTYPEs) and document type definitions (DTDs) in SGML, XML and historically HTML, but they are also used in the vCard and iCalendar file formats to identify the software product which generated the file.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

Animation of Scalable Vector Graphics, an open XML-based standard vector graphics format is possible through various means:

In computing, Facelets is an open-source Web template system under the Apache license and the default view handler technology for Jakarta Server Faces. The language requires valid input XML documents to work. Facelets supports all of the JSF UI components and focuses completely on building the JSF component tree, reflecting the view for a JSF application.

<span class="mw-page-title-main">EPUB</span> E-book file format

EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook (OEB) standard.

Apache Click is a page and component oriented web application framework for the Java language and is built on top of the Java Servlet API.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

References

  1. Walsh, Norman (7 October 2005). "XML Catalogs OASIS Standard V1.1" (PDF). OASIS (pdf). Retrieved 4 November 2023.