Subject Headings Authority File

Last updated

The Subject Headings Authority File (German : Schlagwortnormdatei) or SWD is a controlled vocabulary index term system used primarily for subject indexing in library catalogs. The SWD is managed by the German National Library (DNB) in cooperation with various library networks. The inclusion of keywords in the SWD is defined by "Rules for the Keyword Catalogue" (RSWK). Similar authority systems in other languages include the Library of Congress Subject Headings (LCSH) and the RAMEAU  [ Wikidata ] (French : Répertoire d'autorité-matière encyclopédique et alphabétique unifié). Since April 2012 the SWD is part of the Integrated Authority File (Gemeinsame Normdatei or GND).

File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia, documents or electronic books. File sharing may be achieved in a number of ways. Common methods of storage, transmission and dispersion include manual sharing utilizing removable media, centralized servers on computer networks, World Wide Web-based hyperlinked documents, and the use of distributed peer-to-peer networking.

German language West Germanic language

German is a West Germanic language that is mainly spoken in Central Europe. It is the most widely spoken and official or co-official language in Germany, Austria, Switzerland, South Tyrol (Italy), the German-speaking Community of Belgium, and Liechtenstein. It is also one of the three official languages of Luxembourg and a co-official language in the Opole Voivodeship in Poland. The languages which are most similar to German are the other members of the West Germanic language branch: Afrikaans, Dutch, English, the Frisian languages, Low German/Low Saxon, Luxembourgish, and Yiddish. There are also strong similarities in vocabulary with Danish, Norwegian and Swedish, although those belong to the North Germanic group. German is the second most widely spoken Germanic language, after English.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other forms of knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.



The SWD has about 600,000 descriptors and 700,000 non-descriptors (synonyms and quasi-synonyms) as well as synonymous descriptor chains with references to a descriptor. Its growth rate is about 5.5% per year. About three-quarters of the descriptors refer to individual concepts (language identifier, person, entity, title, ethnography etc.) and a quarter are abstract concepts. Linking using hierarchical (about 115,000) and associative (26,000) relations is not very dense, so the SWD cannot be viewed as a thesaurus (as at mid-2003).

An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records. They are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. They are used as keywords to retrieve documents in an information system, for instance, a catalog or a search engine. A popular form of keywords on the web are tags which are directly visible and can be assigned by non-experts. Index terms can consist of a word, phrase, or alphanumerical term. They are created by analyzing the document either manually with subject indexing or automatically with automatic indexing or more sophisticated methods of keyword extraction. Index terms can either come from a controlled vocabulary or be freely assigned.

Synonym word or phrase that means exactly or nearly the same as another word or phrase in the same language

A synonym is a word or phrase that means exactly or nearly the same as another lexeme in the same language. Words that are synonyms are said to be synonymous, and the state of being a synonym is called synonymy. For example, the words begin, start, commence, and initiate are all synonyms of one another. Words are typically synonymous in one particular sense: for example, long and extended in the context long time or extended time are synonymous, but long cannot be used in the phrase extended family. Synonyms with the exact same meaning share a seme or denotational sememe, whereas those with inexactly similar meanings share a broader denotational or connotational sememe and thus overlap within a semantic field. The former are sometimes called cognitive synonyms and the latter, near-synonyms, plesionyms or poecilonyms.

Thesaurus reference work that lists words grouped together according to similarity of meaning

In general usage, a thesaurus is a reference work that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which provides definitions for words, and generally lists them in alphabetical order. The main purpose of such reference works for users "to find the word, or words, by which [an] idea may be most fitly and aptly expressed" – to quote Peter Mark Roget, architect of the best-known thesaurus in the English language.

The terms in the SWD are also arranged in a separate classification with nearly 500 classes in 36 main groups.

Library classification systems of coding and organizing documents or library materials

A library classification is a system of knowledge organization by which library resources are arranged and ordered systematically. Library classifications use a notational system that represents the order of topics in the classification and allows items to be stored in that order. Library classification systems group related materials together, typically arranged in a hierarchical tree structure. A different kind of classification system, called a faceted classification system, is also widely used which allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways. The library classification numbers can be considered identifiers for resources but are distinct from the International Standard Book Number (ISBN) or International Standard Serial Number (ISSN) system.


The various terms are placed in a classification scheme and also contain references to sources, related terms, preferred term and to a lesser extent hierarchical links. But the SWD is probably not a complete thesaurus because of the low degree of linkage. The SWD is available online through the catalogue database ILTIS and is available for a fee as BIBLIODATA, together with the Name Authority File (PND) and the Corporate Bodies Authority File (GKD) on the standard data CD-ROM and the standard file TITAN. In both cases, the user interface of the SWD leaves much room for improvement. Instead of making SWD accessible as a user-friendly navigation tool, it assumes that users are familiar with SWD and its classification scheme and enter the appropriate descriptor in the correct form before running a search. Navigating the system or moving from one concept to another by means of hyperlinks is not possible. The strategy of the German Library to distribute authority records commercially makes the extended use of the SWD difficult, e.g. in other keyword systems.

The Name Authority File is an authority file of people, which served primarily to access literature in libraries. The PND has been built up between 1995 and 1998 and was published by the German National Library until 2012. For each person there is a record with his or her name, birth and occupation connected with a unique identifier, the PND number.

Corporate Bodies Authority File authority control

The Corporate Bodies Authority File or GKD is a German authority control for the organisation of corporation names from catalogues. It is used mainly for documentation in libraries. Like the Subject Headings Authority File and the Name Authority File, the GKD is looked after and updated by the German National Library (DNB), the Bavarian State Library, the Berlin State Library and, since 1997, the Austrian National Library, several library networks taking part. The responsible editor is the State Library in Berlin. The Common Corporate File was created in the 1970s from the catalogue data of the Journal Database (ZDB). In April 2004 it contained more than 915,000 records.

For the exchange of authority records, there is a separate Machine Exchange Format for Libraries (MAB) format. The head official of the Southwest German Library Network (SWB) offers online access (OSWD).

See also

Faceted Application of Subject Terminology (FAST) is a general use controlled vocabulary based on the Library of Congress Subject Headings (LCSH). FAST is developed as a part of WorldCat by the Online Computer Library Center, Inc. (OCLC), with the goal of making subject cataloging less costly and easier to implement in online contexts. FAST headings separate topical data from non-topical data, such as information about a document's form, chronological coverage, or geographical coverage.

Canadian Subject Headings (CSH) is a list of subject headings in the English language, using controlled vocabulary, to access and express the topic content of documents on Canada and Canadian topics. Library and Archives Canada publishes and maintains CSH on the Web. Prior to the merger of the National Library of Canada and the National Archives of Canada, the National Library of Canada published a print version of CSH.

Related Research Articles

Summary of this page

Medical Subject Headings (MeSH) is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences; it serves as a thesaurus that facilitates searching. Created and updated by the United States National Library of Medicine (NLM), it is used by the MEDLINE/PubMed article database and by NLM's catalog of book holdings. MeSH is also used by registry to classify which diseases are studied by trials registered in ClinicalTrials.

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.

In library science, authority control is a process that organizes bibliographic information, for example in library catalogs by using a single, distinct spelling of a name (heading) or a numeric identifier for each topic. The word authority in authority control derives from the idea that the names of people, places, things, and concepts are authorized, i.e., they are established in one particular form. These one-of-a-kind headings or identifiers are applied consistently throughout catalogs which make use of the respective authority file, and are applied for other methods of organizing data such as linkages and cross references. Each controlled entry is described in an authority record in terms of its scope and usage, and this organization helps the library staff maintain the catalog and make it user-friendly for researchers.

The Library of Congress Subject Headings (LCSH) comprise a thesaurus of subject headings, maintained by the United States Library of Congress, for use in bibliographic records. LC Subject Headings are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. It first appeared in year 1898, a year later to the publication of Library of Congress Classification(1897). The latest 38th edition was published in year 2016. LOC has ceased the print publication and a weekly updated list, supplement to the 38th edition is published. LCSHs are applied to every item within a library's collection, and facilitate a user's access to items in the catalogue that pertain to similar subject matter. If users could only locate items by 'title' or other descriptive fields, such as 'author' or 'publisher', they would have to expend an enormous amount of time searching for items of related subject matter, and undoubtedly miss locating many items because of the ineffective and inefficient search capability.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

The Art & Architecture Thesaurus (AAT) is a controlled vocabulary used for describing items of art, architecture, and material culture. The AAT contains generic terms, such as "cathedral," but no proper names, such as "Cathedral of Notre Dame." The AAT is used by, among others, museums, art libraries, archives, catalogers, and researchers in art and art history. The AAT is a thesaurus in compliance with ISO and NISO standards including ISO 2788, ISO 25964 and ANSI/NISO Z39.19.

DeCS – Health Sciences Descriptors is a structured and trilingual thesaurus created by BIREME – Latin American and Caribbean Center on Health Sciences Information – in 1987 for indexing scientific journal articles, books, proceedings of congresses, technical reports and other types of materials, as well as for searching and recovering scientific information in LILACS, MEDLINE and other databases. In the VHL, Virtual Health Library, DeCS is the tool that permits the navigation between records and sources of information through controlled concepts and organized in Portuguese, Spanish and English.

Polythematic structured-subject heading system

Polythematic structured-subject heading system is a bilingual Czech–English controlled vocabulary of subject headings developed and maintained by the National Technical Library in Prague. It was designed for describing and searching information resources according to their subject. PSH contains more than 13,900 terms, which cover the main fields of human knowledge.

ISO 25964 is the international standard for thesauri, published in two parts as follows:

 ISO 25964  Information and documentation - Thesauri and interoperability with other vocabularies
 Part 1: Thesauri for information retrieval [published August 2011]
 Part 2: Interoperability with other vocabularies [published March 2013]
Integrated Authority File international authority file for personal names, subject headings and corporate bodies

The Integrated Authority File or GND is an international authority file for the organisation of personal names, subject headings and corporate bodies from catalogues. It is used mainly for documentation in libraries and increasingly also by archives and museums. The GND is managed by the German National Library in cooperation with various regional library networks in German-speaking Europe and other partners. The GND falls under the Creative Commons Zero (CC0) licence.

The Llista d'encapçalaments de matèria en català (LEMAC) is a Catalan language controlled vocabulary that includes subject headings – linguistic expression - used by cataloguers to represent the thematic content of documents – a concept, event, name, or title- and that allows users to make a search in a catalogue, bibliography or index. LEMAC is created and maintained by the Servei de Normalització Bibliogràfica of the National Library of Catalonia, and it is applied by librarians to the documents being catalogued, so that users can search items through access points other than authors, titles or publishers. Subject headings also allow users to retrieve headings together when the topic is the same and, at the same time, they show the topics covered in a given collection. LEMAC was developed following the spirit of the Llei 4/1993 del Sistema Bibliotecari de Catalunya in order to "gather in a same union catalogue the bibliographic references integrating the different library resources of the Sistema Bibliotecari de Catalunya".

In the context of information retrieval, a thesaurus is a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.

The LC Linked Data Service is an initiative of the Library of Congress that publishes authority data as linked data. It is commonly referred to by its URI:

The Répertoire de vedettes-matière de l'Université Laval (RVM) is a controlled vocabulary made up of four mostly bilingual thesauruses. It is designed for document indexers, organizations that want to describe the content of their documents or of their products and services, as well as anyone who wants to clarify vocabulary in English and French as part of their work or research.