This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Thesauri and interoperability with other vocabularies Part 1: Thesauri for information retrieval | |
Status | Published |
---|---|
First published | part1: 15 August 2011 part2: 15 March 2013 |
Latest version | part1: First Edition part2: First Edition |
Organization | International Organization for Standardization (ISO) |
Committee | ISO/TC 46/SC 9 |
Series | 01.140.20 (INFORMATION SCIENCES) |
Base standards | ISO 2788, ISO 5964, BS 8723 |
Related standards | ANSI/NISO Z39.19 SKOS |
Domain | Semantics, Thesaurus |
License | ISO |
Copyright | Yes |
Website | part1: www part2: www |
ISO 25964 is the international standard for thesauri, published in two parts as follows:
ISO 25964 Information and documentation - Thesauri and interoperability with other vocabulariesPart 1: Thesauri for information retrieval [published August 2011] Part 2: Interoperability with other vocabularies [published March 2013]
It was issued by ISO, the International Organization for Standardization, and its official website [1] is maintained by its secretariat in NISO, the USA National Information Standards Organization. Each part of the standard can be purchased separately from ISO or from any of its national member bodies (such as ANSI, BSI, AFNOR, DIN, etc.). Some parts of it are available free of charge from the official website.
The first international standard for thesauri was ISO 2788, Guidelines for the establishment and development of monolingual thesauri, originally published in 1974 and updated in 1986. In 1985 it was joined by the complementary standard ISO 5964, Guidelines for the establishment and development of multilingual thesauri. Over the years ISO 2788 and ISO 5964 were adopted as national standards in several countries, for example Canada, France and UK. In the UK they were given alias numbers BS 5723 and BS 6723 respectively. And it was in the UK around the turn of the century that work began to revise them for the networking needs of the new millennium. This resulted during 2005 - 2008 in publication of the 5-part British Standard BS 8723, as follows:
BS 8723 Structured vocabularies for information retrieval - GuidePart 1: Definitions, symbols and abbreviationsPart 2: ThesauriPart 3: Vocabularies other than thesauriPart 4: Interoperability between vocabulariesPart 5: Exchange formats and protocols for interoperability
Even before the last part of BS 8723 was published, work began to adopt and adapt it as an international standard to replace ISO 2788 and ISO 5964. The project was led by a Working Group of ISO's Technical Committee 46 (Information and documentation) Subcommittee 9 (Identification and description) known as “ISO TC46/SC9/WG8 Structured Vocabularies”.
ISO 2788 and ISO 5964 were withdrawn in 2011, when they were replaced by the first part of ISO 25964. The second part of ISO 25964 was issued in March 2013, completing the project.
ISO 25964 is for thesauri intended to support information retrieval, and specifically to guide the choice of terms used in indexing, tagging and search queries.
The primary objective is thus summarised in the introduction to the standard as:
"If both the indexer and the searcher are guided to choose the same term for the same concept, then relevant documents will be retrieved."
Whereas most of the applications envisaged for ISO 2788 and ISO 5964 were databases in a single domain, often in-house or for paper-based systems, ISO 25964 provides additional guidance for the new context of networked applications, including the Semantic Web. A thesaurus is one among several types of controlled vocabulary used in this context.
A thesaurus compliant with ISO 25964-1 (as Part 1 is known) lists all the concepts available for indexing in a given context, and labels each of them with a preferred term, as well as any synonyms that apply. Relationships between the concepts and between the terms are shown, making it easy to navigate around the field while building up a search query. The main types of relationship include:
In multilingual thesauri equivalence also applies between corresponding terms in different natural languages. Establishing correspondence is not always easy, and the standard provides recommendations for handling the difficulties that commonly arise.
ISO 25964-1 explains how to build a monolingual or a multilingual thesaurus, how to display it, and how to manage its development. There is a data model to use for handling thesaurus data (especially when exchanging data between systems) and an XML schema for encoding the data. Both the model and the schema can be accessed 24/7, free of charge, on the official website hosted by NISO. The standard also sets out the features you should look for when choosing software to manage the thesaurus.
ISO 25964-2 deals with the challenges of using one thesaurus in combination with another, and/or with some other type of controlled vocabulary or knowledge organization system (KOS). The types covered include classification schemes, taxonomies, subject heading schemes, ontologies, name authority lists, terminologies and synonym rings. Within a single organization it is common to find several different such KOSs used in contexts such as the records management system, the library catalogue, the corporate intranet, the research lab, etc. To help users with the challenge of running a single search across all the available collections, ISO 25964-2 provides guidance on mapping between the terms and concepts of one thesaurus and those of the other KOSs. Where mapping is not a sensible option, the standard recommends other forms of complementary vocabulary use.
Similarly on the Internet there is an opportunity to make a simultaneous search of repositories and databases that have been indexed with different KOSs, on an even wider scale. Interoperability between the different networks, platforms, software applications, and languages (both natural and artificial) is reliant on the adoption of numerous protocols and standards. ISO 25964-2 is the one to address interoperability between structured vocabularies, especially where a thesaurus is involved.
Since Part 1 of ISO 25964 was published it has been adopted by the national standards bodies in a number of countries. For example, The British Standards Institution (BSI) in the UK has adopted it and labelled it unchanged as BS ISO 25964-1. At the time of writing similar consideration is under way for Part 2. The American standard ANSI/NISO Z39.19 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies covers some of the same ground as ISO 25964-1. It deals with monolingual lists, synonym rings and taxonomies as well as thesauri, but does not provide a data model, nor address multilingual vocabularies or other aspects of interoperability, such as mapping between KOSs. Where the two standards overlap, they are broadly compatible with each other. NISO is actively involved in both standards, having participated in the work of developing ISO 25964 as well as running its secretariat. The W3C Recommendation SKOS (Simple Knowledge Organization System) has a close relationship with ISO 25964 in the context of the Semantic Web. SKOS applies to all sorts of “simple KOSs” that can be found on the Web, including thesauri and others. Whereas ISO 25964-1 advises on the selection and fitting together of concepts, terms and relationships to make a good thesaurus, SKOS addresses the next step - porting the thesaurus to the Web. And whereas ISO 25964-2 recommends the sort of mappings that can be established between one KOS and another, SKOS presents a way of expressing the mappings when published to the Web.
The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources. The Dublin Core Metadata Initiative (DCMI) is responsible for formulating the Dublin Core; DCMI is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization.
A thesaurus, sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings, sometimes as a hierarchy of broader and narrower terms, sometimes simply as lists of synonyms and antonyms. They are often used by writers to help find the best word to express an idea:
...to find the word, or words, by which [an] idea may be most fitly and aptly expressed
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.
Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.
Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability, and to facilitate their interoperability, usually in the context of online learning management systems (LMS).
The Getty Thesaurus of Geographic Names is a product of the J. Paul Getty Trust included in the Getty Vocabulary Program. The TGN includes names and associated information about places. Places in TGN include administrative political entities and physical features. Current and historical places are included. Other information related to history, population, culture, art and architecture is included.
Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.
IMS VDEX, which stands for IMS Vocabulary Definition Exchange, in data management, is a mark-up language – or grammar – for controlled vocabularies developed by IMS Global as an open specification, with the Final Specification being approved in February 2004.
ISO 2788 was the ISO international standard for monolingual thesauri for information retrieval, first published in 1974 and revised in 1986. The official title of the standard was "Guidelines for the establishment and development of monolingual thesauri".
AGROVOC is a multilingual controlled vocabulary covering all areas of interest of the Food and Agriculture Organization of the United Nations (FAO), including food, nutrition, agriculture, fisheries, forestry and the environment. By November 2021, the vocabulary consisted of over 39,600 concepts with up to 924,000 terms in up to 41 different languages. It is a collaborative effort, edited by a community of experts and coordinated by FAO. AGROVOC is made available by FAO as an RDF/SKOS-XL concept scheme and published as a linked data set aligned to 20 other vocabularies.
The AgMES initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability, and data exchange for different types of information resources.
Agricultural Information Management Standards (AIMS) is a web site managed by the Food and Agriculture Organization of the United Nations (FAO) for accessing and discussing agricultural information management standards, tools and methodologies connecting information workers worldwide to build a global community of practice. Information management standards, tools and good practices can be found on AIMS:
ISO/TC 46 is Technical Committee 46 of the International Organization for Standardization (ISO), responsible for Information and documentation.
The Art & Architecture Thesaurus (AAT) is a controlled vocabulary used for describing items of art, architecture, and material culture. The AAT contains generic terms, such as "cathedral", but no proper names, such as "Cathedral of Notre Dame." The AAT is used by, among others, museums, art libraries, archives, catalogers, and researchers in art and art history. The AAT is a thesaurus in compliance with ISO and NISO standards including ISO 2788, ISO 25964 and ANSI/NISO Z39.19.
ISO 5964 was the ISO standard for the establishment and development of multilingual thesauri. Its full title was Guidelines for the establishment and development of multilingual thesauri. It was withdrawn in 2011, when replaced by ISO 25964-1. See more explanation on the official website for ISO 25964
DSSim is an ontology mapping system, that has been conceived to achieve a certain level of the envisioned machine intelligence on the Semantic Web. The main driving factors behind its development was to provide an alternative to the existing heuristics or machine learning based approaches with a multi-agent approach that makes use of uncertain reasoning. The system provides a possible approach to establish machine understanding over Semantic Web data through multi-agent beliefs and conflict resolution.
In the context of information retrieval, a thesaurus is a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.
UMBEL is a logically organized knowledge graph of 34,000 concepts and entity types that can be used in information science for relating information from disparate sources to one another. It was retired at the end of 2019. UMBEL was first released in July 2008. Version 1.00 was released in February 2011. Its current release is version 1.50.
The Nuovo soggettario is a subject indexing system managed and implemented by the National Central Library of Florence, that in Italy has the institutional task to curate and develop the subject indexing tools, as national book archive and as bibliographic production agency of the Italian National Bibliography. It can be used in libraries, archives, media libraries, documentation centers and other institutes of the cultural heritage to index resources of various nature on various supports
The National Agricultural Library Thesaurus (NALT) Concept Space is a controlled vocabulary of terms related to agricultural, biological, physical and social sciences. NALT is used by the National Agricultural Library (NAL) to annotate peer reviewed journal articles for NAL’s bibliographic citation database, AGRICOLA, PubAg, and Ag Data Commons. The Food Safety Research Information Office (FSRIO) and Agriculture Network Information Center (AgNIC) also use the NALT as the indexing vocabulary for their information systems. In addition, the NALT is used as an aid for locating information at the Agricultural Research Service (ARS) and the Economic Research Service (ERS) web sites and databases.