Ontology alignment

Last updated

Ontology alignment, or ontology matching, is the process of determining correspondences between concepts in ontologies. A set of correspondences is also called an alignment. The phrase takes on a slightly different meaning, in computer science, cognitive science or philosophy.

Contents

Computer science

For computer scientists, concepts are expressed as labels for data. Historically, the need for ontology alignment arose out of the need to integrate heterogeneous databases, ones developed independently and thus each having their own data vocabulary. In the Semantic Web context involving many actors providing their own ontologies, ontology matching has taken a critical place for helping heterogeneous resources to interoperate. Ontology alignment tools find classes of data that are semantically equivalent, for example, "truck" and "lorry". The classes are not necessarily logically identical. According to Euzenat and Shvaiko (2007), [1] there are three major dimensions for similarity: syntactic, external, and semantic. Coincidentally, they roughly correspond to the dimensions identified by Cognitive Scientists below. A number of tools and frameworks have been developed for aligning ontologies, some with inspiration from Cognitive Science and some independently.

Ontology alignment tools have generally been developed to operate on database schemas, [2] XML schemas, [3] taxonomies, [4] formal languages, entity-relationship models, [5] dictionaries, and other label frameworks. They are usually converted to a graph representation before being matched. Since the emergence of the Semantic Web, such graphs can be represented in the Resource Description Framework line of languages by triples of the form <subject, predicate, object>, as illustrated in the Notation 3 syntax. In this context, aligning ontologies is sometimes referred to as "ontology matching".

The problem of Ontology Alignment has been tackled recently by trying to compute matching first and mapping (based on the matching) in an automatic fashion. Systems like DSSim, X-SOM [6] or COMA++ obtained at the moment very high precision and recall. [3] The Ontology Alignment Evaluation Initiative aims to evaluate, compare and improve the different approaches.

Formal definition

Given two ontologies and where is the set of classes, is the set of relations, is the set of individuals, is the set of data types, and is the set of values, we can define different types of (inter-ontology) relationships. [1] Such relationships will be called, all together, alignments and can be categorized among different dimensions:

Subsumption, atomic, homogeneous alignments are the building blocks to obtain richer alignments, and have a well defined semantics in every Description Logic. Let's now introduce more formally ontology matching and mapping.

An atomic homogeneous matching is an alignment that carries a similarity degree , describing the similarity of two terms of the input ontologies and . Matching can be either computed, by means of heuristic algorithms, or inferred from other matchings.

Formally we can say that, a matching is a quadruple , where and are homogeneous ontology terms, is the similarity degree of . A (subsumption, homogeneous, atomic) mapping is defined as a pair , where and are homogeneous ontology terms.

Cognitive science

For cognitive scientists interested in ontology alignment, the "concepts" are nodes in a semantic network that reside in brains as "conceptual systems." The focal question is: if everyone has unique experiences and thus different semantic networks, then how can we ever understand each other? This question has been addressed by a model called ABSURDIST (Aligning Between Systems Using Relations Derived Inside Systems for Translation). Three major dimensions have been identified for similarity as equations for "internal similarity, external similarity, and mutual inhibition." [7]

Ontology alignment methods

Two sub research fields have emerged in ontology mapping, namely monolingual ontology mapping and cross-lingual ontology mapping. The former refers to the mapping of ontologies in the same natural language, whereas the latter refers to "the process of establishing relationships among ontological resources from two or more independent ontologies where each ontology is labelled in a different natural language". [8] Existing matching methods in monolingual ontology mapping are discussed in Euzenat and Shvaiko (2007). [1] Approaches to cross-lingual ontology mapping are presented in Fu et al. (2011). [9]

See also

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.

Description logics (DL) are a family of formal knowledge representation languages. Many DLs are more expressive than propositional logic but less expressive than first-order logic. In contrast to the latter, the core reasoning problems for DLs are (usually) decidable, and efficient decision procedures have been designed and implemented for these problems. There are general, spatial, temporal, spatiotemporal, and fuzzy description logics, and each description logic features a different balance between expressive power and reasoning complexity by supporting different sets of mathematical constructors.

Kripke semantics is a formal semantics for non-classical logic systems created in the late 1950s and early 1960s by Saul Kripke and André Joyal. It was first conceived for modal logics, and later adapted to intuitionistic logic and other non-classical systems. The development of Kripke semantics was a breakthrough in the theory of non-classical logics, because the model theory of such logics was almost non-existent before Kripke.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

A semantic mapper is tool or service that aids in the transformation of data elements from one namespace into another namespace. A semantic mapper is an essential component of a semantic broker and one tool that is enabled by the Semantic Web technologies.

Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists, email archives, presence information, documents of all sorts, contacts, search results, and advertising and marketing relevance derived from them. In this regard, semantics focuses on the organization of and action upon information by acting as an intermediary between heterogeneous data sources, which may conflict not only by structure but also context or value.

<span class="mw-page-title-main">Semantic technology</span> Technology to help machines understand data

The ultimate goal of semantic technology is to help machines understand data. To enable the encoding of semantics with the data, well-known technologies are RDF and OWL. These technologies formally represent the meaning involved in information. For example, ontology can describe concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume, complexity and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

Ontology-based data integration involves the use of one or more ontologies to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology‑based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.

The terms schema matching and mapping are often used interchangeably for a database process. For this article, we differentiate the two as follows: schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student and DB2.Grad-Student ; possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades.

OneSource is an evolving data analysis tool used internally by the Air Combat Command (ACC) Vocabulary Services Team, and made available to general data management community. It is used by the greater US Department of Defense (DoD) and NATO community for controlled vocabulary management and exploration. It provides its users with a consistent view of syntactical, lexical, and semantic data vocabularies through a community-driven web environment. It was created with the intention of directly supporting the DoD Net-centric Data Strategy of visible, understandable, and accessible data assets.

Minimal mappings are the result of an advanced technique of semantic matching, a technique used in computer science to identify information which is semantically related.

Semantic matching is a technique used in computer science to identify information which is semantically related.

A lightweight ontology is an ontology or knowledge organization system in which concepts are connected by rather general associations than strict formal connections. Examples of lightweight ontologies include associative network and multilingual classifications but the term is not used consistently.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

DSSim is an ontology mapping system, that has been conceived to achieve a certain level of the envisioned machine intelligence on the Semantic Web. The main driving factors behind its development was to provide an alternative to the existing heuristics or machine learning based approaches with a multi-agent approach that makes use of uncertain reasoning. The system provides a possible approach to establish machine understanding over Semantic Web data through multi-agent beliefs and conflict resolution.

<span class="mw-page-title-main">Conceptualization (information science)</span> Abstract simplified view of selected part(s) of the world

In information science a conceptualization is an abstract simplified view of some selected part of the world, containing the objects, concepts, and other entities that are presumed of interest for some particular purpose and the relationships between them. An explicit specification of a conceptualization is an ontology, and it may occur that a conceptualization can be realized by several distinct ontologies. An ontological commitment in describing ontological comparisons is taken to refer to that subset of elements of an ontology shared with all the others. "An ontology is language-dependent", its objects and interrelations described within the language it uses, while a conceptualization is always the same, more general, its concepts existing "independently of the language used to describe it". The relation between these terms is shown in the figure to the right.

Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.

References

  1. 1 2 3 Jérôme Euzenat and Pavel Shvaiko. 2013. Ontology matching Archived 2010-01-16 at the Wayback Machine , Springer-Verlag, 978-3-642-38720-3.
  2. J. Berlin and A. Motro. 2002. Database Schema Matching Using Machine Learning with Feature Selection. Proc. of the 14th International Conference on Advanced Information Systems Engineering, pp. 452-466
  3. 1 2 D. Aumueller, H. Do, S. Massmann, E. Rahm. 2005. Schema and ontology matching with COMA++. Proc. of the 2005 International Conference on Management of Data, pp. 906-908
  4. S. Ponzetto, R. Navigli. 2009. "Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia". Proc. of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), Pasadena, California, pp. 2083-2088.
  5. A. H. Doan, A. Y. Halevy. Semantic integration research in the database community: A brief survey. AI magazine, 26(1), 2005
  6. Carlo A. Curino and Giorgio Orsi and Letizia Tanca (2007). "X-SOM: A Flexible Ontology Mapper" (PDF). International Workshop on Semantic Web Architectures for Enterprises (SWAE'07) in Conjunction with the 18th International Conference on Database and Expert Systems Applications (DEXA'07). Archived from the original (PDF) on July 4, 2009.
  7. R. Goldstone and B. Rogosky. 2002. Using relations within conceptual systems to translate across conceptual systems. Cognition 84, pp. 295–320.
  8. Bo Fu, Rob Brennan, Declan O'Sullivan, A Configurable Translation-Based Cross-Lingual Ontology Mapping System to adjust Mapping Outcomes. Journal of Web Semantics, Volume 15, 15-36, ISSN 1570-8268, 2012 .
  9. Fu B., Brennan R., O'Sullivan D., Using Pseudo Feedback to Improve Cross-Lingual Ontology Mapping . In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), LNCS 6643, pp.336-351, Heraklion, Greece, May 2011.

Further reading