Internationalization Tag Set

Last updated October 06, 2022

The Internationalization Tag Set (ITS) is a set of attributes and elements designed to provide internationalization and localization support in XML documents.^[1]

The ITS specification identifies concepts (called "ITS data categories") which are important for internationalization and localization. It also defines implementation of these concepts through a set of elements and attributes grouped in the ITS namespace. XML developers can use this namespace to integrate internationalization features directly into their own XML schemas and documents.

Overview

ITS v1.0 includes seven data categories:

Translate: Defines what parts of a document are translatable or not.
Localization Note: Provides alerts, hints, instructions, or other information to help the localizers or the translators.
Terminology: Indicates which parts of the documents are terms and optionally points to information about these terms.
Directionality: Indicates what type of display directionality should be applied to parts of the document.
Ruby: Indicates what parts of the document should be displayed as ruby text. (Ruby is a short run of text alongside a base text, typically used in East Asian documents to indicate pronunciation or to provide a brief annotation).
Language Information: Identifies the language of the different parts of the document.
Elements Within Text: Indicates how elements should be treated with regard to linguistic segmentation.

The vocabulary is designed to address two different aspects: First by providing markup usable directly in the XML documents. Second, by offering a way to indicate if there are parts of a given markup that correspond to some of the ITS data categories and should be treated as such by ITS processors.

ITS applies to both new document types as well as existing ones. It also applies to both markups without any internationalization features as well documents already supporting internationalization or localization-related functions.

ITS can be specified using global rules and local rules.

The global rules are expressed anywhere in the document (embedded global rules), or even outside the document (external global rules), using the its:rules element.
The local rules are expressed by specialized attributes (and sometimes elements) specified inside the document instance, at the location where they apply.

Examples

Translate data category

In the following ITS markup example, the elements and attributes with the its prefix are part of the ITS namespace. The its:rules element lists the different rules to apply to this file. There is one its:translateRule rule that indicates that any content inside the head element should not be translated.

The its:translate attributes used in some elements are utilized to override the global rule. Here, to make translatable the content of title and to make non-translatable the text "faux pas".

<textxmlns:its="http://www.w3.org/2005/11/its"><head><revision>2006-09-10 v5</revision><author>Gerson Chicareli</author><contact>someone@example.com</contact><titleits:translate="yes">The Origins of Modern Novel</title><its:rulesversion="1.0"><its:translateRuletranslate="no"selector="/text/head"/></its:rules></head><body><divxml:id="intro"><head>Introduction</head><p>It would certainly be quite a <spanits:translate="no">faux pas</span> to start a dissertation on the origin of modern novel without  mentioning the <tl>HKLM of GFDL</tl>...</p></div></body></text>

Localization Note data category

In the following ITS markup example, the its:locNote element specifies that any node corresponding to the XPath expression "//msg/data" has an associated note. The location of that note is expressed by the locNotePointer attribute, which holds a relative XPath expression pointing to the node where the note is, here ="../notes".

Note also the use of the its:translate attribute to mark the notes elements as non-translatable.

<Resxmlns:its="http://www.w3.org/2005/11/its"><prolog><its:rulesversion="1.0"><its:translateRuleselector="//msg/notes"translate="no"/><its:locNoteRulelocNoteType="description"selector="//msg/data"locNotePointer="../notes"/></its:rules></prolog><body><msgid="FileNotFound"><notes>Indicates that the resource file {0} could not be loaded.</notes><data>Cannot find the file {0}.</data></msg><msgid="DivByZero"><notes>A division by 0 was going to be computed.</notes><data>Invalid parameter.</data></msg></body></Res>

ITS limitations

ITS does not have a solution to all XML internationalization and localization issues.

One reason is that version 1.0 does not have data categories for everything. For example, there is currently no way to indicate a relation source/target in bilingual files where some parts of a document store the source text and some other parts the corresponding translation.

The other reason is that many aspects of internationalization cannot be resolved with markup. This is due to the design of the DTD or the schema itself. There are best practices, design and authoring guidelines help make documents are correctly internationalized and easy to localize. For example, using attributes to store translatable text is a bad idea for many different reasons, but ITS cannot prevent an XML developer from making such choice.

Some of the ITS 1.0 limitations are being addressed in the version 2.0: See http://www.w3.org/TR/its20/ for more details.

Related Research Articles

A document type definition (DTD) is a set of markup declarations that define a document type for an SGML-family markup language.

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

SyncML is the former name for a platform-independent information synchronization standard. The project is currently referred to as Open Mobile Alliance Data Synchronization and Device Management. The purpose of SyncML is to offer an open standard as a replacement for existing data synchronization solutions, which have mostly been somewhat vendor-, application- or operating system specific. SyncML 1.0 specification was released on December 17, 2000, and 1.1 on February 26, 2002.

XInclude is a generic mechanism for merging XML documents, by writing inclusion tags in the "main" document to automatically include other documents or parts thereof. The resulting document becomes a single composite XML Information Set. The XInclude mechanism can be used to incorporate content from either XML files or non-XML text files.

The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains the TEI technical standard, a journal, a wiki, a GitHub repository and a toolchain.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

The identity transform is a data transformation that copies the source data into the destination data without change.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

XLIFF is an XML-based bitext format created to standardize the way localizable data are passed between and among tools during a localization process and a common format for CAT tool exchange. The XLIFF Technical Committee (TC) first convened at OASIS in December 2001, but the first fully ratified version of XLIFF appeared as XLIFF Version 1.2 in February 2008. Its current specification is v2.1 released on 2018-02-13, which is backwards compatible with v2.0 released on 2014-08-05.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulae, graphics, bibliographies etc.

HTML attributes are special words used inside the opening tag to control the element's behaviour. HTML attributes are a modifier of an HTML element type. An attribute either modifies the default functionality of an element type or provides functionality to certain element types unable to function correctly without them. In HTML syntax, an attribute is added to an HTML start tag.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

In computing, a polyglot markup is a document or script written in a valid form of multiple markup languages, which performs the same output, independent of the markup's parser, layout engine, or interpreter. In general, the polyglot markup is a common subset of two or more languages, that can be used as a robust or simplified profile.

References

↑ Internationalization Tag Set (ITS)

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Internationalization Tag Set (ITS)

[1]