Lonclass

Last updated

The BBC's Lonclass ("London Classification") is a subject classification system used internally at the BBC throughout its archives. [1]

Contents

Lonclass is derived from the Universal Decimal Classification (UDC), itself a reworking of the earlier Dewey Decimal Classification (DDC). Lonclass dates from the 1960s, [1] whereas UDC was created from DDC in the late 19th century. The BBC adaptation of UDC preserves the core features that distinguish UDC from DDC: an emphasis on a compositional semantics that allows new items to be expressed in terms of relationships between known items.

Lonclass and UDC (like DDC) are expressed using codes based on decimal numbers. Unlike DDC, the Lonclass and UDC codes use additional punctuation to express patterns of relationships and re-usable qualifiers. While Lonclass makes a few structural adjustments to the UDC system to support its emphasis on TV and radio content, its main distinction is in the actual set of topics that are recorded within its authority file and within specific BBC catalogue records. Unlike UDC and DDC, which are widely used across the library community, Lonclass has remained a BBC-internal system since its creation in the 1960s. [1] There are 300,000 subject terms in the Lonclass vocabulary. [2]

Examples

For example, a Lonclass or UDC code may represent "Report on the environmental impact of the decline of tin mining in Sweden in the 20th century". This would be represented as a sequence of numbers and punctuation. The complex code can be broken down into primitives such as "Sweden" or "tin mining" (which itself could be broken down to "Tin" and "Mining").

Some specific BBC Lonclass examples (with explanatory labels):

Notes

  1. 1 2 3 Vardan, Vijaylakshmi (2008). "Book Review : A Handbook for Media Librarians". Library Management. 29 (8/9): 814–819. doi:10.1108/01435120810917576. ISSN   0143-5124.
  2. Dowman, M.; V. Tablan, H. Cunningham, C. Ursu, and B. Popov. (2005). "Semantically Enhanced Television News through Web and Video Integration." In Second European Semantic Web Conference (ESWC’2005).

Related Research Articles

<span class="mw-page-title-main">Dewey Decimal Classification</span> Library classification system

The Dewey Decimal Classification (DDC), colloquially known as the Dewey Decimal System, is a proprietary library classification system which allows new books to be added to a library in their appropriate location based on subject. It was first published in the United States by Melvil Dewey in 1876. Originally described in a 44-page pamphlet, it has been expanded to multiple volumes and revised through 23 major editions, the latest printed in 2011. It is also available in an abridged version suitable for smaller libraries. OCLC, a non-profit cooperative that serves libraries, currently maintains the system and licenses online access to WebDewey, a continuously updated version for catalogers.

<span class="mw-page-title-main">Library classification</span> Systems of coding and organizing documents or library materials

A library classification is a system of organization of knowledge in which sources are arranged according to the classification scheme and ordered very systematically. Library classifications are a notational system that represents the order of topics in the classification and allows items to be stored in the order of classification. Library classification systems group related materials together, typically arranged as a hierarchical tree structure. A different kind of classification system, called a faceted classification system, is also widely used, which allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in many ways.

<span class="mw-page-title-main">Universal Decimal Classification</span> Bibliographic and library classification system

The Universal Decimal Classification (UDC) is a bibliographic and library classification representing the systematic arrangement of all branches of human knowledge organized as a coherent system in which knowledge fields are related and inter-linked. The UDC is an analytico-synthetic and faceted classification system featuring detailed vocabulary and syntax that enables powerful content indexing and information retrieval in large collections. Since 1991, the UDC has been owned and managed by the UDC Consortium, a non-profit international association of publishers with headquarters in The Hague, Netherlands.

In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

A faceted classification is a classification scheme used in organizing knowledge into a systematic order. A faceted classification uses semantic categories, either general or subject-specific, that are combined to create the full classification entry. Many library classification systems use a combination of a fixed, enumerative taxonomy of concepts with subordinate facets that further refine the topic.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

Corporate taxonomy is the hierarchical classification of entities of interest of an enterprise, organization or administration, used to classify documents, digital assets and other information. Taxonomies can cover virtually any type of physical or conceptual entities at any level of granularity.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

Gellish is an ontology language for data storage and communication, designed and developed by Andries van Renssen since mid-1990s. It started out as an engineering modeling language but evolved into a universal and extendable conceptual data modeling language with general applications. Because it includes domain-specific terminology and definitions, it is also a semantic data modelling language and the Gellish modeling methodology is a member of the family of semantic modeling methodologies.

Ontotext is a software company with offices in Europe and USA. It is the semantic technology branch of Sirma Group. Its main domain of activity is the development of software based on the Semantic Web languages and standards, in particular RDF, OWL and SPARQL. Ontotext is best known for the Ontotext GraphDB semantic graph database engine. Another major business line is the development of enterprise knowledge management and analytics systems that involve big knowledge graphs. Those systems are developed on top of the Ontotext Platform that builds on top of GraphDB capabilities for text mining using big knowledge graphs.

Folksonomy is a classification system in which end users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tags and how often they are applied or searched for, in contrast to a taxonomic classification designed by the owners of the content and specified when it is published. This practice is also known as collaborative tagging, social classification, social indexing, and social tagging. Folksonomy was originally "the result of personal free tagging of information [...] for one's own retrieval", but online sharing and interaction expanded it into collaborative forms. Social tagging is the application of tags in an open online environment where the tags of other users are available to others. Collaborative tagging is tagging performed by a group of users. This type of folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking.

Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to this structured data because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

<span class="mw-page-title-main">Asset Description Metadata Schema</span>

The Asset Description Metadata Schema (ADMS) is a common metadata vocabulary to describe standards, so-called interoperability assets, on the Web.

The Information Coding Classification (ICC) is a classification system covering almost all extant 6500 knowledge fields. Its conceptualization goes beyond the scope of the well known library classification systems, such as Dewey Decimal Classification (DDC), Universal Decimal Classification (UDC), and Library of Congress Classification (LCC), by extending also to knowledge systems that so far have not afforded to classify literature. ICC actually presents a flexible universal ordering system for both literature and other kinds of information, set out as knowledge fields. From a methodological point of view, ICC differs from the above-mentioned systems along the following three lines:

  1. Its main classes are not based on disciplines but on nine live stages of development, so-called ontical levels.
  2. It breaks them roughly down into hierarchical steps by further nine categories which makes decimal number coding possible.
  3. The contents of a knowledge field is earmarked via a digital position scheme, which makes the first hierarchical step refer to the nine ontical levels, and the second hierarchical step refer to nine functionally ordered form categories.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.