Simple Knowledge Organization System

Last updated
SKOS
Simple Knowledge Organization System
StatusPublished (W3C Recommendation)
Year started1997;24 years ago (1997)
Latest version Core, Reference, RDF, Primer
August 2009, 18;11 years ago (18-08-2009)
Organization World Wide Web Consortium (W3C)
Committee Semantic Web Deployment Working Group
Authors Alistair Miles, Sean Bechhofer
Base standards RDF
Related standards RDFa, OWL , ISO 25964, Dublin Core
Domain Semantic Web
AbbreviationSKOS
Website www.w3.org/2009/08/skos-reference/skos.html

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Contents

History

DESIRE II project (1997–2000)

The most direct ancestor to SKOS was the RDF Thesaurus work undertaken in the second phase of the EU DESIRE project [1] [ citation needed ]. Motivated by the need to improve the user interface and usability of multi-service browsing and searching, [2] a basic RDF vocabulary for Thesauri was produced. As noted later in the SWAD-Europe workplan, the DESIRE work was adopted and further developed in the SOSIG and LIMBER projects. A version of the DESIRE/SOSIG implementation was described in W3C's QL'98 workshop, motivating early work on RDF rule and query languages: A Query and Inference Service for RDF. [3]

LIMBER (1999–2001)

SKOS built upon the output of the Language Independent Metadata Browsing of European Resources (LIMBER) project funded by the European Community, and part of the Information Society Technologies programme. In the LIMBER project CCLRC further developed an RDF thesaurus interchange format [4] which was demonstrated on the European Language Social Science Thesaurus (ELSST) at the UK Data Archive as a multilingual version of the English language Humanities and Social Science Electronic Thesaurus (HASSET) which was planned to be used by the Council of European Social Science Data Archives CESSDA.

SWAD-Europe (2002–2004)

SKOS as a distinct initiative began in the SWAD-Europe project, bringing together partners from both DESIRE, SOSIG (ILRT) and LIMBER (CCLRC) who had worked with earlier versions of the schema. It was developed in the Thesaurus Activity Work Package, in the Semantic Web Advanced Development for Europe (SWAD-Europe) project. [5] SWAD-Europe was funded by the European Community, and part of the Information Society Technologies programme. The project was designed to support W3C's Semantic Web Activity through research, demonstrators and outreach efforts conducted by the five project partners, ERCIM, the ILRT at Bristol University, HP Labs, CCLRC and Stilo. The first release of SKOS Core and SKOS Mapping were published at the end of 2003, along with other deliverables on RDF encoding of multilingual thesauri [6] and thesaurus mapping. [7]

Semantic web activity (2004–2005)

Following the termination of SWAD-Europe, SKOS effort was supported by the W3C Semantic Web Activity [8] in the framework of the Best Practice and Deployment Working Group. [9] During this period, focus was put both on consolidation of SKOS Core, and development of practical guidelines for porting and publishing thesauri for the Semantic Web.

Development as W3C Recommendation (2006–2009)

The SKOS main published documents — the SKOS Core Guide, [10] the SKOS Core Vocabulary Specification, [11] and the Quick Guide to Publishing a Thesaurus on the Semantic Web [12] — were developed through the W3C Working Draft process. Principal editors of SKOS were Alistair Miles, [13] initially Dan Brickley, and Sean Bechhofer.

The Semantic Web Deployment Working Group, [14] chartered for two years (May 2006 – April 2008), put in its charter to push SKOS forward on the W3C Recommendation track. The roadmap projected SKOS as a Candidate Recommendation by the end of 2007, and as a Proposed Recommendation in the first quarter of 2008. The main issues to solve were determining its precise scope of use, and its articulation with other RDF languages and standards used in libraries (such as Dublin Core). [15] [16]

Formal release (2009)

On August 18, 2009, W3C released the new standard that builds a bridge between the world of knowledge organization systems – including thesauri, classifications, subject headings, taxonomies, and folksonomies – and the linked data community, bringing benefits to both. Libraries, museums, newspapers, government portals, enterprises, social networking applications, and other communities that manage large collections of books, historical artifacts, news reports, business glossaries, blog entries, and other items can now use SKOS [17] to leverage the power of linked data.

Historical view of components

SKOS was originally designed as a modular and extensible family of languages, organized as SKOS Core, SKOS Mapping, and SKOS Extensions, and a Metamodel. The entire specification is now complete within the namespace http://www.w3.org/2004/02/skos/core#.

Overview

In addition to the reference itself, the SKOS Primer (a W3C Working Group Note) summarizes the Simple Knowledge Organization System.

The SKOS [18] defines the classes and properties sufficient to represent the common features found in a standard thesaurus. It is based on a concept-centric view of the vocabulary, where primitive objects are not terms, but abstract notions represented by terms. Each SKOS concept is defined as an RDF resource. Each concept can have RDF properties attached, including:

Concepts can be organized in hierarchies using broader-narrower relationships, or linked by non-hierarchical (associative) relationships. Concepts can be gathered in concept schemes, to provide consistent and structured sets of concepts, representing whole or part of a controlled vocabulary.

Element categories

The principal element categories of SKOS are concepts, labels, notations, documentation, semantic relations, mapping properties, and collections. The associated elements are listed in the table below.

SKOS Vocabulary Organized by Theme
ConceptsLabels & NotationDocumentationSemantic RelationsMapping PropertiesCollections
ConceptprefLabelnotebroaderbroadMatchCollection
ConceptSchemealtLabelchangeNotenarrowernarrowMatchorderedCollection
inSchemehiddenLabeldefinitionrelatedrelatedMatchmember
hasTopConceptnotationeditorialNotebroaderTransitivecloseMatchmemberList
topConceptOfexamplenarrowerTransitiveexactMatch
historyNotesemanticRelationmappingRelation
scopeNote

Concepts

The SKOS vocabulary is based on concepts. Concepts are the units of thought—ideas, meanings, or objects and events (instances or categories)—which underlie many knowledge organization systems. As such, concepts exist in the mind as abstract entities which are independent of the terms used to label them. In SKOS, a Concept (based on the OWL Class) is used to represent items in a knowledge organization system (terms, ideas, meanings, etc.) or such a system's conceptual or organizational structure.

A ConceptScheme is analogous to a vocabulary, thesaurus, or other way of organizing concepts. SKOS does not constrain a concept to be within a particular scheme, nor does it provide any way to declare a complete scheme—there is no way to say the scheme consists only of certain members. A topConcept is (one of) the upper concept(s) in a hierarchical scheme.

Labels and notations

Each SKOS label is a string of Unicode characters, optionally with language tags, that are associated with a concept. The prefLabel is the preferred human-readable string (maximum one per language tag), while altLabel can be used for alternative strings, and hiddenLabel can be used for strings that are useful to associate, but not meant for humans to read.

A SKOS notation is similar to a label, but this literal string has a datatype, like integer, float, or date; the datatype can even be made up (see 6.5.1 Notations, Typed Literals and Datatypes in the SKOS Reference). The notation is useful for classification codes and other strings not recognizable as words.

Documentation

The Documentation or Note properties provide basic information about SKOS concepts. All the concepts are considered a type of skos:note; they just provide more specific kinds of information. The property definition, for example, should contain a full description of the subject resource. More specific note types can be defined in a SKOS extension, if desired. A query for <A> skos:note ? will obtain all the notes about <A>, including definitions, examples, and scope, history and change, and editorial documentation.

Any of these SKOS Documentation properties can refer to several object types: a literal (e.g., a string); a resource node that has its own properties; or a reference to another document, for example using a URI. This enables the documentation to have its own metadata, like creator and creation date.

Specific guidance on SKOS documentation properties can be found in the SKOS Primer Documentary Notes.

Semantic relations

SKOS semantic relations are intended to provide ways to declare relationships between concepts within a concept scheme. While there are no restrictions precluding their use with two concepts from separate schemes, this is discouraged because it is likely to overstate what can be known about the two schemes, and perhaps link them inappropriately.

The property related simply makes an association relationship between two concepts; no hierarchy or generality relation is implied. The properties broader and narrower are used to assert a direct hierarchical link between two concepts. The meaning may be unexpected; the relation <A> broader <B> means that A has a broader concept called B—hence that B is broader than A. Narrower follows in the same pattern.

While the casual reader might expect broader and narrower to be transitive properties, SKOS does not declare them as such. Rather, the properties broaderTransitive and narrowerTransitive are defined as transitive super-properties of broader and narrower. These super-properties are (by convention) not used in declarative SKOS statements. Instead, when a broader or narrower relation is used in a triple, the corresponding transitive super-property also holds; and transitive relations can be inferred (and queried) using these super-properties.

Mapping

SKOS mapping properties are intended to express matching (exact or fuzzy) of concepts from one concept scheme to another, and by convention are used only to connect concepts from different schemes. The concepts relatedMatch, broadMatch, and narrowMatch are a convenience, with the same meaning as the semantic properties related, broader, and narrower. (See previous section regarding the meanings of broader and narrower.)

The property relatedMatch makes a simple associative relationship between two concepts. When concepts are so closely related that they can generally be used interchangeably, exactMatch is the appropriate property (exactMatch relations are transitive, unlike any of the other Match relations). The closeMatch property that indicates concepts that only sometimes can be used interchangeably, and so it is not a transitive property.

Concept collections

The concept collections (Collection, orderedCollection) are labeled and/or ordered (orderedCollection) groups of SKOS concepts. Collections can be nested, and can have defined URIs or not (which is known as a blank node). Neither a SKOS Concept nor a ConceptScheme may be a Collection, nor vice versa; and SKOS semantic relations can only be used with a Concept (not a Collection). The items in a Collection can not be connected to other SKOS Concepts through the Collection node; individual relations must be defined to each Concept in the Collection.

Community and participation

All development work is carried out via the mailing list which is a completely open and publicly archived [19] mailing list devoted to discussion of issues relating to knowledge organisation systems, information retrieval and the Semantic Web. Anyone may participate informally in the development of SKOS by joining the discussions on public-esw-thes@w3.org – informal participation is warmly welcomed. Anyone who works for a W3C member organisation may formally participate in the development process by joining the Semantic Web Deployment Working Group – this entitles individuals to edit specifications and to vote on publication decisions.

Applications

Tools

Data

There are publicly available SKOS data sources.

Relationships with other standards

Metamodel

The SKOS metamodel is broadly compatible with the data model of ISO 25964-1 – Thesauri for Information Retrieval. This data model can be viewed and downloaded from the website for ISO 25964. [41]

Semantic model of the information elements of SKOS Skos metamodel.png
Semantic model of the information elements of SKOS

Thesaurus standards

SKOS development has involved experts from both RDF and library community, and SKOS intends to allow easy migration of thesauri defined by standards such as NISO Z39.19 – 2005 [42] or ISO 25964. [41]

Other semantic web standards

SKOS is intended to provide a way to make a legacy of concept schemes available to Semantic Web applications, simpler than the more complex ontology language, OWL. OWL is intended to express complex conceptual structures, which can be used to generate rich metadata and support inference tools. However, constructing useful web ontologies is demanding in terms of expertise, effort, and cost. In many cases, this type of effort might be superfluous or unsuited to requirements, and SKOS might be a better choice. The extensibility of RDF makes possible further incorporation or extension of SKOS vocabularies into more complex vocabularies, including OWL ontologies.

See also

Related Research Articles

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

RDF Schema is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can entail some knowledge from them using a query language, like SPARQL.

The semantic spectrum is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.


A web resource, or simply resource, is any identifiable thing, whether digital, physical, or abstract. Resources are identified using Uniform Resource Identifiers. In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework.

AGROVOC is a multilingual controlled vocabulary covering all areas of interest to the Food and Agriculture Organization of the United Nations (FAO), including food, nutrition, agriculture, fisheries, forestry and the environment. The vocabulary consists of over 35,000 concepts with up to 671,000 terms in different languages. It is a collaborative effort, edited by a community of experts and coordinated by FAO.

The AgMES initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability and data exchange for different types of information resources.

Gellish is an ontology language for data storage and communication, designed and developed by Andries van Renssen since mid-1990s. It started out as an engineering modeling language but evolved into a universal and extendable conceptual data modeling language with general applications. Because it includes domain-specific terminology and definitions, it is also a semantic data modelling language and the Gellish modeling methodology is a member of the family of semantic modeling methodologies.

Semantically-Interlinked Online Communities

Semantically-Interlinked Online Communities Project is a Semantic Web technology. SIOC provides methods for interconnecting discussion methods such as blogs, forums and mailing lists to each other. It consists of the SIOC ontology, an open-standard machine readable format for expressing the information contained both explicitly and implicitly in Internet discussion methods, of SIOC metadata producers for a number of popular blogging platforms and content management systems, and of storage and browsing/searching systems for leveraging this SIOC data.

Ontology engineering field which studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies: formal representations of a set of concepts within a domain and the relationships between those concepts. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

ISO 25964

ISO 25964 is the international standard for thesauri, published in two parts as follows:

ISO 25964 Information and documentation - Thesauri and interoperability with other vocabulariesPart 1: Thesauri for information retrieval [published August 2011]  Part 2: Interoperability with other vocabularies [published March 2013]

Knowledge Organization Systems (KOS), concept system or concept scheme is a generic term used in knowledge organization about authority files, classification schemes, thesauri, topic maps, ontologies etc.

In the context of information retrieval, a thesaurus is a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.

In the Semantic Web and in knowledge representation, a metaclass is a class whose instances are themselves classes. Similar to their role in programming languages, metaclasses in Semantic Web languages can have properties otherwise applicable only to individuals, while retaining the same class's ability to be classified in a concept hierarchy. This enables knowledge about instances of those metaclasses to be inferred by semantic reasoners using statements made in the metaclass. Metaclasses thus enhance the expressivity of knowledge representations in a way that can be intuitive for users. While classes are suitable to represent a population of individuals, metaclasses can, as one of their feature, be used to represent the conceptual dimension of an ontology. Metaclasses are supported in the ontology language OWL and the data-modeling vocabulary RDFS.

The PoolParty Semantic Suite is a technology platform provided by the Semantic Web Company. The EU-based company belongs to the early pioneers of the Semantic Web movement. The software supports enterprises in knowledge management, data analytics and content organisation. The product uses standards-based technologies as defined by W3C, which prevents vendor lock-in. Reference customers are among others Boehringer Ingelheim, Credit Suisse, European Commission, REEEP, Wolters Kluwer and the World Bank Group.

OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it.

References

  1. Desire: Development of a European Service for Information on Research and Education, Desire Consortium, August 7, 2000, archived from the original on July 25, 2011
  2. Desire: Research Deliverables: D3.1, Desire Consortium, archived from the original on May 9, 2008
  3. "A Query and Inference Service for RDF". www.w3.org.
  4. Miller, Ken; Matthews, Brian (24 January 2006). "Having the Right Connections: the LIMBER Project". Journal of Digital Information. 1 (8).
  5. "Semantic Web Advanced Development for Europe (SWAD-Europe)". www.w3.org.
  6. "SWAD-Europe Deliverable 8.3 : RDF Encoding of Multilingual Thesauri". Archived from the original on 2006-06-16.
  7. "SWAD-Europe Deliverable 8.4 : Inter-Thesaurus Mapping". Archived from the original on 2006-04-30.
  8. "W3C Semantic Web Activity Homepage". www.w3.org.
  9. "Porting Thesauri Task Force (PORT) / Semantic Web Best Practices and Deployment Working Group / W3C Semantic Web Activity". www.w3.org.
  10. SKOS Core Guide W3C Working Draft 2 November 2005
  11. SKOS Core Vocabulary Specification W3C Working Draft 2 November 2005
  12. Quick Guide to Publishing a Thesaurus on the Semantic Web W3C Working Draft 17 May 2005
  13. "Alistair Miles". purl.org.
  14. "W3C Semantic Web Deployment Working Group". www.w3.org.
  15. SKOS: Requirements for Standardization. The paper by Alistair Miles presented in October 2006 at the International Conference on Dublin Core and Metadata Applications.
  16. Retrieval and the Semantic Web, incorporating a Theory of Retrieval Using Structured Vocabularies. Dissertation on the theory of retrieval using structured vocabularies by Alistair Miles.
  17. "SKOS Simple Knowledge Organization System Reference". www.w3.org.
  18. "SKOS Simple Knowledge Organization System Reference". www.w3.org.
  19. public-esw-thes@w3.org online archive. Archives of mailing list used for SKOS development.
  20. "About the Library of Congress Authorities". Archived from the original on 2010-01-03.
  21. "Semantic Web Environmental Directory". Archived from the original on 2006-08-30.
  22. "A Method to Convert Thesauri to SKOS". thesauri.cs.vu.nl.
  23. Subject classification using DITA and SKOS by IBM developerWorks.
  24. Unilexicon web based visual taxonomy editor
  25. "eScienceCenter/ThesauRex". GitHub. 22 March 2020.
  26. "Opentheso - Copyright".
  27. TemaTres is an open source web-based vocabulary server for managing controlled vocabularies, taxonomies and thesauruses
  28. ThManager an Open Source Tool for creating and visualizing SKOS RDF vocabularies.
  29. "Validation Services - SKOS Simple Knowledge Organization System". www.w3.org.
  30. "VocBench: A Collaborative Management System for SKOS-XL Thesauri". vocbench.uniroma2.it.
  31. PoolParty is a thesaurus management system and a SKOS editor for the Semantic Web.
  32. qSKOS is an open-source tool for SKOS vocabulary quality assessment.
  33. SKOSEd SKOS plugin for Protege 4
  34. Protégé 4 Protégé 4 OWL editor
  35. SKOS Java API Java API for SKOS
  36. Model Futures Excel SKOS Exporter
  37. Lexaurus is an enterprise thesaurus management system and multi-format editor.
  38. Ricci, Semweb LLC, Fabio. "SKOS Shuttle". skosshuttle.ch.
  39. "TopBraid Enterprise Vocabulary Net - TopQuadrant, Inc".
  40. "SKOS/Datasets - Semantic Web Standards". www.w3.org.
  41. 1 2 "ISO 25964 – the international standard for thesauri and interoperability with other vocabularies - NISO website". www.niso.org.
  42. NISO Standards Z39.19 – 2005 : Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies