Semantic spectrum

Last updated

The semantic spectrum , sometimes referred to as the ontology spectrum, the smart data continuum, or semantic precision, is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.

Contents

At the low end of the spectrum is a simple binding of a single word or phrase and its definition. At the high end is a full ontology that specifies relationships between data elements using precise URIs for relationships and properties.

With increased specificity comes increased precision and the ability to use tools to automatically integrate systems but also increased cost to build and maintain a metadata registry.

Some steps in the semantic spectrum include the following:

  1. glossary: A simple list of terms and their definitions. A glossary focuses on creating a complete list of the terminology of domain-specific terms and acronyms. It is useful for creating clear and unambiguous definitions for terms, and because it can be created with simple word processing tools, few technical tools are necessary.
  2. controlled vocabulary: A simple list of terms, definitions and naming conventions. A controlled vocabulary frequently has some type of oversight process associated with adding or removing data element definitions to ensure consistency. Terms are often defined in relationship to each other.
  3. data dictionary: Terms, definitions, naming conventions and one or more representations of the data elements in a computer system. Data dictionaries often define data types, validation checks such as enumerated values and the formal definitions of each of the enumerated values.
  4. data model: Terms, definitions, naming conventions, representations and one or more representations of the data elements as well as the beginning of specification of the relationships between data elements including abstractions and containers.
  5. taxonomy: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element". The difference between a data model and a formal taxonomy is the arrangement of data elements into a formal tree structure where each element in the tree is a formally defined concept with associated properties.
  6. ontology: A complete, machine-readable specification of a conceptualization using URIs (and then IRIs) for all data elements, properties and relationship types. The W3C standard language for representing ontologies is the Web Ontology Language (OWL). Ontologies frequently contain formal business rules formed in discrete logic statements that relate data elements to each another.

Typical questions for determining semantic precision

The following is a list of questions that may arise in determining semantic precision.

correctness
How can correct syntax and semantics be enforced? Are tools (such as XML Schema) readily available to validate syntax of data exchanges?
adequacy/expressivity/scope
Does the system represent everything that is of practical use for the purpose? Is an emphasis being placed on data that is externalized (exposed or transferred between systems)?
efficiency
How efficiently can the representation be searched / queried, and - possibly - reasoned on?
complexity
How steep is the learning curve for defining new concepts, querying for them or constraining them? Are there appropriate tools for simplifying typical workflows? (See also: ontology editor)
translatability
Can the representation easily be transformed (e.g. by Vocabulary-based transformation) into an equivalent representation so that semantic equivalence is ensured?

Determining location on the semantic spectrum

Many organizations today are building a metadata registry to store their data definitions and to perform metadata publishing. The question of where they are on the semantic spectrum frequently arises. To determine where your systems are, some of the following questions are frequently useful.

  1. Is there a centralized glossary of terms for the subject matter?
  2. Does the glossary of terms include precise definitions for each terms?
  3. Is there a central repository to store data elements that includes data types information?
  4. Is there an approval process associated with the creation and changes to data elements?
  5. Are coded data elements fully enumerated? Does each enumeration have a full definition?
  6. Is there a process in place to remove duplicate or redundant data elements from the metadata registry?
  7. Is there one or more classification schemes used to classify data elements?
  8. Are document exchanges and web services created using the data elements?
  9. Can the central metadata registry be used as part of a Model-driven architecture?
  10. Are there staff members trained to extract data elements that can be reused in metadata structures?

Strategic nature of semantics

Today, much of the World Wide Web is stored as Hypertext Markup Language. Search engines are severely hampered by their inability to understand the meaning of published web pages. These limitations have led to the advent of the Semantic web movement.

In the past, many organizations that created custom database application used isolated teams of developers that did not formally publish their data definitions. These teams frequently used internal data definitions that were incompatible with other computer systems. This made Enterprise Application Integration and Data warehousing extremely difficult and costly. Many organizations today require that teams consult a centralized data registry before new applications are created.

The job title of an individual that is responsible for coordinating an organization's data is a Data architect.

History

The first reference to this term was at the 1999 AAAI Ontologies Panel. The panel was organized by Chris Welty, who at the prodding of Fritz Lehmann and in collaboration with the panelists (Fritz, Mike Uschold, Mike Gruninger, and Deborah McGuinness) came up with a "spectrum" of kinds of information systems that were, at the time, referred to as ontologies. The "ontology spectrum" picture appeared in print in the introduction to Formal Ontology and Information Systems: Proceedings of the 2001 Conference. The ontology spectrum was also featured in a talk at the Semantics for the Web meeting in 2000 at Dagstuhl by Deborah McGuinness. McGuinness produced a paper describing the points on that spectrum that appeared in the book that emerged (much later) from that workshop called "Spinning the Semantic Web." Later, Leo Obrst extended the spectrum into two dimensions (which technically is not really a spectrum anymore) and added a lot more detail, which was included in his book, The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management.

The concept of the Semantic precision in business systems was popularized by Dave McComb in his book Semantics in Business Systems: The Savvy Managers Guide published in 2003 where he frequently uses the term Semantic Precision.

This discussion centered around a 10 level partition that included the following levels (listed in the order of increasing semantic precision):

  1. Simple Catalog of Data Elements
  2. Glossary of Terms and Definitions
  3. Thesauri, Narrow Terms, Relationships
  4. Informal "Is-a" relationships
  5. Formal "Is-a" relationships
  6. Formal instances
  7. Frames (properties)
  8. Value Restrictions
  9. Disjointness, Inverse, Part-of
  10. General Logical Constraints

Note that there was formerly a special emphasis on the adding of formal is-a relationships to the spectrum which has been dropped.

The company Cerebra has also popularized this concept by describing the data formats that exist within an enterprise in their ability to store semantically precise metadata. Their list includes:

  1. HTML
  2. PDF
  3. Word Processing documents
  4. Microsoft Excel
  5. Relational databases
  6. XML
  7. XML Schema
  8. Taxonomies
  9. Ontologies

What the concepts share in common is the ability to store information with increasing precision to facilitate intelligent agents.

See also

Related Research Articles

<span class="mw-page-title-main">Dublin Core</span> Standardized set of metadata elements

The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources. The Dublin Core Metadata Initiative (DCMI) is responsible for formulating the Dublin Core; DCMI is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:

  1. An identification such as a data element name
  2. A clear data element definition
  3. One or more representation terms
  4. Optional enumerated values Code (metadata)
  5. A list of synonyms to data elements in other metadata registries Synonym ring

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938. This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description Interface. Thus, it is not a standard which deals with the actual encoding of moving pictures and audio, like MPEG-1, MPEG-2 and MPEG-4. It uses XML to store metadata, and can be attached to timecode in order to tag particular events, or synchronise lyrics to a song, for example.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks, including:

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

The ISO/IEC 11179 metadata registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandable and shareable.

A representation term is a word, or a combination of words, that semantically represent the data type of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.

Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system.

Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes.

In information science and ontology, a classification scheme is the product of arranging things into kinds of things (classes) or into groups of classes; this bears similarity to categorization, but with perhaps a more theoretical bent, as classification can be applied over a wide semantic spectrum.

In metadata, metadata discovery is the process of using automated tools to discover the semantics of a data element in data sets. This process usually ends with a set of mappings between the data source elements and a centralized metadata registry. Metadata discovery is also known as metadata scanning.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

IMS VDEX, which stands for IMS Vocabulary Definition Exchange, in data management, is a mark-up language – or grammar – for controlled vocabularies developed by IMS Global as an open specification, with the Final Specification being approved in February 2004.

<span class="mw-page-title-main">BioMOBY</span> Registry of Web Services in Bioinformatics

BioMOBY is a registry of web services used in bioinformatics. It allows interoperability between biological data hosts and analytical services by annotating services with terms taken from standard ontologies. BioMOBY is released under the Artistic License.

Semantic interoperability is the ability of computer systems to exchange data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data federation between information systems.

Geospatial metadata is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog.

The terms schema matching and mapping are often used interchangeably for a database process. For this article, we differentiate the two as follows: schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student and DB2.Grad-Student ; possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades.

References