OntoClean

Last updated

OntoClean is a methodology for analyzing ontologies based on formal, domain-independent properties of classes (the metaproperties) developed by Nicola Guarino and Chris Welty.

Contents

Overview and History

OntoClean was the first attempt to formalize notions of ontological analysis for information systems. The idea was to justify the kinds of decisions that experienced ontology builders make, and explain the common mistakes of the inexperienced. Alan Rector, during a debate at the KR-2002 conference in Toulouse, said, "What you have done is reduce the amount of time I spend arguing with medics."

The notions Guarino & Welty focused on were drawn from philosophical ontology. They were not after the seemingly endless arguments about what the right ontology of the universe is, but rather the techniques these philosophers use to analyze, support, and criticize each other's arguments. These techniques make very little, if any, commitment to a particular ontology, instead they expose what are often very subtle distinctions.

The ideas underlying OntoClean appeared first in the literature in a series of three papers published in 2000. [1] [2] [3] The name OntoClean does not appear in the literature until 2002. [4] According to Thompson-ISI, work on OntoClean was the most cited of academic papers on Ontology. [5] OntoClean was important as it was the first formal methodology for ontology engineering, applying scientific principles to a field whose practice was mostly art.

Note on terminology

In logic, a property is a unary predicate in intention, in other words a property is what it means to be a member of a class. For example, we say that instances of the Person class have the property of "being a person." In the semantic web, a property is a binary relation.

The distinction between property and class is subtle, and probably not critical to understanding OntoClean, however this article, follows the OntoClean publications and consistently uses "property" according to its original meaning, and one can treat "property" and "class" as synonymous. Thus a metaproperty is a property of a property or class.

Metaproperties

The basis of OntoClean are the domain-independent properties of classes, the OntoClean metaproperties: identity, unity, rigidity, and dependence. Later work by Welty & Andersen [6] has added two more metaproperties: permanence and actuality.

Identity

Identity is fundamental to ontology, and especially to information systems ontologies. Identity is well known in metaphysics and in database conceptual modeling. In the latter case, it is an accepted best practice to specify a primary key for rows in a table. If "two" rows have identical primary keys, they are considered the same row.

More importantly for ontology are questions of identity that expose the existence of, or at least the need to represent, other entities. Here the issue at stake is finding the conditions under which a proposed entity would be both the same and different. The classic example is an amount of clay that is shaped into a statue. If you use the same clay but reshape it into a different statue, is it the same entity? If so, how could it be different? If not, how could it be the same. In conceptual modeling, it is understood that when such an ambiguity arises, one should treat it as two different entities to account for a situation where one changes and the other stays the same.

In OntoClean, identity criteria are associated with, or carried by, some classes of entities, called sortals. A sortal is a class all of whose instances are identified in the same way. In information systems, these criteria are often extrinsic, like a social security number or universally unique id, which is not interesting from an ontological point of view. Identity criteria should be informative, they should help us and others understand what a class means. A triangle, for example, can be identified by the length of its three sides, or by two sides and an interior angle, etc. This says a lot about what is intended by the triangle class here, e.g. the same triangle could be in many places at the same time. Someone else may have an ontology in which the triangle class has different identity criteria, such that different drawings are always different triangles, even if they are the same size. Identity criteria (and OntoClean, for that matter) do not tell you that one of these definitions of triangle is right or wrong, just that they are different and thus that the classes are different.

Identity criteria and sortals are intuitively meant to account for the linguistic habit of associating identity with certain classes. In the classical statue and clay example, we naturally say "the same clay" or "the same statue", indicating that there are identity criteria that are peculiar to each class.

Being a sortal is the first OntoClean metaproperty, indicated with the +I superscript (−I for non-sortals) on a class in the original notation. +I (but not −I) is inherited down the class hierarchy, if a class is a sortal then all its subclasses are as well.

Unity

There are certain properties that only hold of individuals that are wholes. In formal ontology, wholes are often distinguished from mere sums, which are individuals whose boundaries are, in a sense, arbitrary. For example, consider the class clay. An instance of this class might be some amount of the material (this is only one possible meaning, of course), such that any (in fact, every) arbitrary subsection of the amount would be a different instance of the same class. By contrast, instances of the class Person are, typically, not decomposable in this fashion.

For the purposes of OntoClean, wholes are individuals all of whose parts are related to each other, and only to each other, by some distinguished relation. This relation can be viewed as a generalized connection relation. Mere sums have no such relation since any decomposition of a mere sum is connected to any larger sum, which is not one of its parts, by the same relation.

Unity is the metaproperty, indicated by +U, of classes all of whose individuals are wholes under the same relation. Like identity, OntoClean does not require that the relation itself be specified, often it is enough to know that the relation exists. Intuitively, a class has unity if all its instances are the same type of whole, and is typically true of classes of natural objects. Non-unity, indicated by −U, is the meta-property of classes whose instances are not all wholes, or not all wholes by the same relation. A further and more useful refinement of non-unity is anti-unity, indicated by ~U, the meta-property of classes all of whose instances are not wholes, such as classes of mere sums. +U and ~U (but not −U) are inherited down the class hierarchy.

Rigidity

Leibniz's law makes good sense when first considered, however it doesn't take long to see how considerations of time causes problems between most ontologies (especially semantic web ontologies) and Leibniz's law. For example, I might have a beard on one day and shave it off the next, yet I am the same entity at both times. How is it possible for me to be the same if I have changed?

There are many logical approaches to this classic dilemma (including simply ignoring it), the most common is to consider some properties to be essential; an essential property (and, q.v. terminology above, properties are unary predicates) of an entity is a property that cannot change, and these are the properties for which Leibniz's law holds. Other properties of an entity that can change are non-essential and cannot be involved in identity.

Some properties are essential to all their instances. Think of the property of being a person, usually represented by the class Person. For every entity that has this property, the property is essential. So at least one of the properties that has not changed about me when I shave my beard is that I am a person. These properties, that are essential to all their instances, are rigid properties.

Rigid properties are designated by +R, and properties that are not rigid −R. An important specialization of non-rigid properties are anti-rigid properties (~R), which are properties that must be changeable. Think of being a student — all students must possibly not be students. ~R (but not −R or +R) is inherited down the class hierarchy.

Note that these are just examples — it is certainly possible to have an ontology in which Person is anti-rigid. Imagine an ontology of mystical beliefs, for example, in which an entity changes from Person to Spirit upon death. In order for the individual to be the same across this change, being a person must not be essential and furthermore must be changeable (i.e. anti-rigid).

Rigidity should not be confused with Kripke's notion of Rigid Designators, which are particulars. The term rigid in OntoClean is meant to describe the instanceOf link between an individual and a rigid class — it cannot be broken.

Dependence

Dependence is a varied notion. In the core OntoClean papers, Guarino & Welty used a kind of dependence that captures a meta-property of certain relational roles. A property is dependent if each instance of it implies the existence of another entity. The property Student, for example, is dependent, since to be a student there must be a teacher; for every instance of student there is at least one instance of teacher. In later work for [Dolce] this was noted to subsume two kinds of property dependence: specific constant dependence and generic constant dependence. The former accounts for dependence on specific entities, e.g. each person is dependent on having a particular brain. The latter accounts for the Student/Teacher case, where any instance of Teacher will do.

There are many other kinds of dependence, see [Fine and Smith, 1983] and especially [Simons, 1987]. [7] It is an open problem to adapt them into the OntoClean framework.

Being dependent is indicated with +D, being independent with −D. +D (but not −D) is inherited down the class hierarchy.

Related Research Articles

<span class="mw-page-title-main">Metaphysics</span> Branch of philosophy dealing with the nature of reality

Metaphysics is the branch of philosophy that studies the fundamental nature of reality; the first principles of being, identity and change, space and time, cause and effect, necessity and possibility.

<span class="mw-page-title-main">Ontology</span> Branch of philosophy that studies concepts such as existence, being, becoming, and reality

In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality.

In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

In logic, philosophy and related fields, mereology is the study of parts and the wholes they form. Whereas set theory is founded on the membership relation between a set and its elements, mereology emphasizes the meronomic relation between entities, which—from a set-theoretic perspective—is closer to the concept of inclusion between sets.

<span class="mw-page-title-main">Essence</span> Properties that make an entity or substance what it is

Essence is a polysemic term, that is, it may have significantly different meanings and uses. It is used in philosophy and theology as a designation for the property or set of properties or attributes that make an entity or substance what it fundamentally is, and which it has by necessity, and without which it loses its identity. Essence is contrasted with accident: a property or attribute the entity or substance has contingently, without which the substance can still retain its identity.

In logic and philosophy, a property is a characteristic of an object; a red object is said to have the property of redness. The property may be considered a form of object in its own right, able to possess other properties. A property, however, differs from individual objects in that it may be instantiated, and often in more than one object. It differs from the logical/mathematical concept of class by not having any concept of extensionality, and from the philosophical concept of class in that a property is considered to be distinct from the objects which possess it. Understanding how different individual entities can in some sense have some of the same properties is the basis of the problem of universals.

In modal logic and the philosophy of language, a term is said to be a rigid designator or absolute substantial term when it designates the same thing in all possible worlds in which that thing exists. A designator is persistently rigid if it also designates nothing in all other possible worlds. A designator is obstinately rigid if it designates the same thing in every possible world, period, whether or not that thing exists in that world. Rigid designators are contrasted with connotative terms, non-rigid or flaccid designators, which may designate different things in different possible worlds.

Quantity or amount is a property that can exist as a multitude or magnitude, which illustrate discontinuity and continuity. Quantities can be compared in terms of "more", "less", or "equal", or by assigning a numerical value multiple of a unit of measurement. Mass, time, distance, heat, and angle are among the familiar examples of quantitative properties.

In information science, an upper ontology is an ontology which consists of very general terms that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies.

Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists, email archives, presence information, documents of all sorts, contacts, search results, and advertising and marketing relevance derived from them. In this regard, semantics focuses on the organization of and action upon information by acting as an intermediary between heterogeneous data sources, which may conflict not only by structure but also context or value.

In philosophy, the term formal ontology is used to refer to an ontology defined by axioms in a formal language with the goal to provide an unbiased view on reality, which can help the modeler of domain- or application-specific ontologies to avoid possibly erroneous ontological assumptions encountered in modeling large-scale ontologies.

The social trinitarianism is a Christian interpretation of the Trinity as consisting of three persons in a loving relationship, which reflects a model for human relationships.

Nicola Guarino is an Italian computer scientist and researcher in the area of Formal Ontology for Information Systems, and the head of the Laboratory for Applied Ontology (LOA), part of the Italian National Research Council (CNR) in Trento.

<span class="mw-page-title-main">Chris Welty</span> American computer scientist

Christopher A. Welty is an American computer scientist, who works at Google Research in New York. He is best known for his work on ontologies, in the Semantic Web, and on IBM's Watson. While on sabbatical from Vassar College from 1999 to 2000, he collaborated with Nicola Guarino on OntoClean; he was co-chair of the W3C Rule Interchange Format working group from 2005 to 2009.

In philosophy, specifically in the area of metaphysics, counterpart theory is an alternative to standard (Kripkean) possible-worlds semantics for interpreting quantified modal logic. Counterpart theory still presupposes possible worlds, but differs in certain important respects from the Kripkean view. The form of the theory most commonly cited was developed by David Lewis, first in a paper and later in his book On the Plurality of Worlds.

Contemporary ontologies share many structural similarities, regardless of the ontology language in which they are expressed. Most ontologies describe individuals (instances), classes (concepts), attributes, and relations.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

<span class="mw-page-title-main">Conceptualization (information science)</span> Abstract simplified view of selected part(s) of the world

In information science a conceptualization is an abstract simplified view of some selected part of the world, containing the objects, concepts, and other entities that are presumed of interest for some particular purpose and the relationships between them. An explicit specification of a conceptualization is an ontology, and it may occur that a conceptualization can be realized by several distinct ontologies. An ontological commitment in describing ontological comparisons is taken to refer to that subset of elements of an ontology shared with all the others. "An ontology is language-dependent", its objects and interrelations described within the language it uses, while a conceptualization is always the same, more general, its concepts existing "independently of the language used to describe it". The relation between these terms is shown in the figure to the right.

The term classification can apply to one or all of:

In philosophy, similarity or resemblance is a relation between objects that constitutes how much these objects are alike. Similarity comes in degrees: e.g. oranges are more similar to apples than to the moon. It is traditionally seen as an internal relation and analyzed in terms of shared properties: two things are similar because they have a property in common. The more properties they share, the more similar they are. They resemble each other exactly if they share all their properties. So an orange is similar to the moon because they both share the property of being round, but it is even more similar to an apple because additionally, they both share various other properties, like the property of being a fruit. On a formal level, similarity is usually considered to be a relation that is reflexive, symmetric and non-transitive. Similarity comes in two forms: respective similarity, which is relative to one respect or feature, and overall similarity, which expresses the degree of resemblance between two objects all things considered. There is no general consensus whether similarity is an objective, mind-independent feature of reality, and, if so, whether it is a fundamental feature or reducible to other features. Resemblance is central to human cognition since it provides the basis for the categorization of entities into kinds and for various other cognitive processes like analogical reasoning. Similarity has played a central role in various philosophical theories, e.g. as a solution to the problem of universals through resemblance nominalism or in the analysis of counterfactuals in terms of similarity between possible worlds.

References

  1. Guarino, Nicola, and Chris Welty. 2000. Ontological Analysis of Taxonomic Relationships. In, Laender, A. and Storey, V., eds, Proceedings of ER-2000: The 19th International Conference on Conceptual Modeling. Springer-Verlag. October, 2000.
  2. Guarino, Nicola, and Chris Welty. 2000. A Formal Ontology of Properties. In, Dieng, R., and Corby, O., eds, Proceedings of EKAW-2000: The 12th International Conference on Knowledge Engineering and Knowledge Management. Berlin:Springer LNCS Vol. 1937/2000. Pp. 97–112. October, 2000.
  3. Guarino, Nicola, and Chris Welty. 2000. Identity, Unity, and Individuation: Towards a Formal Toolkit for Ontological Analysis. In W. Horn, ed., Proceedings of ECAI-2000: The European Conference on Artificial Intelligence. Amsterdam:IOS Press. Pp. 219–223. August, 2000.
  4. Guarino, Nicola and Chris Welty. 2002. Evaluating Ontological Decisions with OntoClean. Communications of the ACM. 45(2):61–65. New York:ACM Press
  5. Thompson. "Emerging Research Fronts:Ontologies".
  6. Welty and Andersen, 2005. Towards OntoClean 2.0: A framework for Rigidity.
  7. Simons, P., 1987, Parts: A Study in Ontology, Oxford: Clarendon Press.