Topic map

Last updated

TopicMapKeyConcepts2.PNG

A topic map is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information. Topic maps were originally developed in the late 1990s as a way to represent back-of-the-book index structures so that multiple indexes from different sources could be merged. However, the developers quickly realized that with a little additional generalization, they could create a meta-model with potentially far wider application. The ISO standard is formally known as ISO/IEC 13250:2003.

Contents

A topic map represents information using

Topic maps are similar to concept maps and mind maps in many respects, though only topic maps are ISO standards. Topic maps are a form of semantic web technology similar to RDF.

Ontology and merging

Topics, associations, and occurrences can all be typed, where the types must be defined by the one or more creators of the topic map(s). The definitions of allowed types is known as the ontology of the topic map.

Topic maps explicitly support the concept of merging of identity between multiple topics or topic maps. Furthermore, because ontologies are topic maps themselves, they can also be merged thus allowing for the automated integration of information from diverse sources into a coherent new topic map. Features such as subject identifiers (URIs given to topics) and PSIs (published subject indicators) are used to control merging between differing taxonomies. Scoping on names provides a way to organise the various names given to a particular topic by different sources.

Current standard

The work standardizing topic maps (ISO/IEC 13250) took place under the umbrella of the ISO/IEC JTC1/SC34/WG3 committee (ISO/IEC Joint Technical Committee 1, Subcommittee 34, Working Group 3 – Document description and processing languages – Information Association). [1] [2] [3] However, WG3 was disbanded and maintenance of ISO/IEC 13250 was assigned to WG8.

The topic maps (ISO/IEC 13250) reference model and data model standards are defined independent of any specific serialization or syntax.

Data format

The specification is summarized in the abstract as follows: "This specification provides a model and grammar for representing the structure of information resources used to define topics, and the associations (relationships) between topics. Names, resources, and relationships are said to be characteristics of abstract subjects, which are called topics. Topics have their characteristics within scopes: i.e. the limited contexts within which the names and resources are regarded as their name, resource, and relationship characteristics. One or more interrelated documents employing this grammar is called a topic map."

XML serialization formats

Note that XTM 1.0 predates and therefore is not compatible with the more recent versions of the (ISO/IEC 13250) standard.

Other formats

Other proposed or standardized serialization formats include:

The above standards are all recently proposed or defined as part of ISO/IEC 13250. As described below, there are also other, serialization formats such as LTM, AsTMa= that have not been put forward as standards.

Linear topic map notation (LTM) serves as a kind of shorthand for writing topic maps in plain text editors. This is useful for writing short personal topic maps or exchanging partial topic maps by email. The format can be converted to XTM.

There is another format called AsTMa which serves a similar purpose. When writing topic maps manually it is much more compact, but of course can be converted to XTM. Alternatively, it can be used directly with the Perl Module TM (which also supports LTM).

The data formats of XTM and LTM are similar to the W3C standards for RDF/XML or the older N3 notation. [4]

Topic Maps API

A de facto API standard called Common Topic Maps Application Programming Interface (TMAPI) was published in April 2004 and is supported by many Topic Maps implementations or vendors:

Query standard

In normal use it is often desirable to have a way to arbitrarily query the data within a particular Topic Maps store. Many implementations provide a syntax by which this can be achieved (somewhat like 'SQL for Topic Maps') but the syntax tends to vary a lot between different implementations. With this in mind, work has gone into defining a standardized syntax for querying topic maps:

Constraint standards

It can also be desirable to define a set of constraints that can be used to guarantee or check the semantic validity of topic maps data for a particular domain. (Somewhat like database constraints for topic maps). Constraints can be used to define things like 'every document needs an author' or 'all managers must be human'. There are often implementation specific ways of achieving these goals, but work has gone into defining a standardized constraint language as follows:

TMCL is functionally similar to RDF Schema with Web Ontology Language (OWL). [4]

Earlier standards

The "Topic Maps" concept has existed for a long time. The HyTime standard was proposed as far back as 1992 (or earlier?). Earlier versions of ISO 13250 (than the current revision) also exist. More information about such standards can be found at the ISO Topic Maps site.

RDF relationship

Some work has been undertaken to provide interoperability between the W3C's RDF/OWL/SPARQL family of semantic web standards and the ISO's family of Topic Maps standards though the two have slightly different goals.

The semantic expressive power of Topic Maps is, in many ways, equivalent to that of RDF, but the major differences are that Topic Maps (i) provide a higher level of semantic abstraction (providing a template of topics, associations and occurrences, while RDF only provides a template of two arguments linked by one relationship) and (hence) (ii) allow n-ary relationships (hypergraphs) between any number of nodes, while RDF is limited to triplets.

See also

Related Research Articles

Dublin Core Standardized set of metadata elements

The Dublin Core, also known as the Dublin Core Metadata Element Set, is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has been formally standardized as ISO 15836, ANSI/NISO Z39.85, and IETF RFC 5013. The core properties are part of a larger set of DCMI Metadata Terms. "Dublin Core" is also used as an adjective for Dublin Core metadata, a style of metadata that draws on multiple RDF vocabularies, packaged and constrained in Dublin Core application profiles.

Moving Picture Experts Group working group to set standards for audio and video compression

The Moving Picture Experts Group (MPEG) is a working group of authorities that was formed by ISO and IEC to set standards for audio and video compression and transmission. MPEG is officially a collection of ISO Working Groups and Advisory Groups under ISO/IEC JTC 1/SC 29 – Coding of audio, picture, multimedia and hypermedia information.

Standard Generalized Markup Language Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

The Document Style Semantics and Specification Language (DSSSL) is an international standard developed to provide stylesheets for SGML documents.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938. This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description Interface. Thus, it is not a standard which deals with the actual encoding of moving pictures and audio, like MPEG-1, MPEG-2 and MPEG-4. It uses XML to store metadata, and can be attached to timecode in order to tag particular events, or synchronise lyrics to a song, for example.

HyTime is a markup language that is an application of SGML. HyTime defines a set of hypertext-oriented element types that, in effect, supplement SGML and allow SGML document authors to build hypertext and multimedia presentations in a standardized way.

Common Logic (CL) is a framework for a family of logic languages, based on first-order logic, intended to facilitate the exchange and transmission of knowledge in computer-based systems.

The ISO/IEC 11179 Metadata Registry (MDR) standard is an international ISO standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandable and shareable.

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by Ecma, and by the ISO and IEC in later versions.

The Open Document Format for Office Applications, commonly known as OpenDocument, was based on OpenOffice.org XML, as used in OpenOffice.org 1, and was standardised by the Organization for the Advancement of Structured Information Standards (OASIS) consortium.

Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the Resource Description Framework (RDF) data model. Turtle syntax is similar to that of SPARQL, an RDF query language. It is a common data format for storing RDF data, along with N-Triples, JSON-LD and RDF/XML.

ISO/IEC JTC 1/SC 34, Document description and processing languages is a subcommittee of the ISO/IEC JTC1 joint technical committee, which is a collaborative effort of both the International Organization for Standardization and the International Electrotechnical Commission, which develops and facilitates standards within the field of document description and processing languages. The international secretariat of ISO/IEC JTC 1/SC 34 is the Japanese Industrial Standards Committee (JISC) located in Japan.

ISO/IEC JTC 1/SC 22 Programming languages, their environments and system software interfaces is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) that develops and facilitates standards within the fields of programming languages, their environments and system software interfaces. ISO/IEC JTC 1/SC 22 is also sometimes referred to as the "portability subcommittee". The international secretariat of ISO/IEC JTC 1/SC 22 is the American National Standards Institute (ANSI), located in the United States.

Language resource management - Lexical markup framework, is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication.

The Office Open XML file formats were standardised between December 2006 and November 2008, first by the Ecma International consortium, and subsequently, after a contentious standardization process, by the ISO/IEC's Joint Technical Committee 1.

ISO/IEC JTC 1/SC 37 Biometrics is a standardization subcommittee in the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), which develops and facilitates standards within the field of biometrics. The international secretariat of ISO/IEC JTC 1/SC 37 is the American National Standards Institute (ANSI), located in the United States.

ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that develops and facilitates standards within the field of coded character sets. The international secretariat of ISO/IEC JTC 1/SC 2 is the Japanese Industrial Standards Committee (JISC), located in Japan. SC 2 is responsible for the development of the Universal Coded Character Set which is the international standard corresponding to the Unicode Standard.

ISO/IEC JTC 1/SC 31 Automatic identification and data capture techniques is a subcommittee of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) Joint Technical Committee (JTC) 1, and was established in 1996. SC 31 develops and facilitates international standards, technical reports, and technical specifications in the field of automatic identification and data capture techniques. The first Plenary established three working groups (WGs): Data Carriers, Data Content, and Conformance. Subsequent Plenaries established other working groups: RFID, RTLS, Mobile Item Identification and Management, Security and File Management, and Applications.

References

  1. ISO JTC1/SC34. "JTC 1/SC 34 – Document Description and Processing Languages". Archived from the original on 6 May 2014. Retrieved 25 December 2009.
  2. "Home of SC34/WG3 – Information Association". 3 June 2008. Retrieved 26 December 2009.
  3. ISO. "JTC 1/SC 34 – Document description and processing languages". ISO. Retrieved 25 December 2009.
  4. 1 2 Lars Marius Garshol (2003). "Living With Topic Maps and RDF" . Retrieved 21 February 2014.

Further reading