Gellish

Last updated

Gellish is an ontology language for data storage and communication, designed and developed by Andries van Renssen since mid-1990s. [1] It started out as an engineering modeling language ("Generic Engineering Language", giving it the name, "Gellish") but evolved into a universal and extendable conceptual data modeling language with general applications. Because it includes domain-specific terminology and definitions, it is also a semantic data modelling language and the Gellish modeling methodology is a member of the family of semantic modeling methodologies.

Contents

Although its concepts have 'names' and definitions in various natural languages, Gellish is a natural-language-independent formal language. Any natural language variant, such as Gellish Formal English is a controlled natural language. Information and knowledge can be expressed in such a way that it is computer-interpretable, as well as system-independent and natural language independent. Each natural language variant is a structured subset of that natural language and is suitable for information modeling and knowledge representation in that particular language. All expressions, concepts and individual things are represented in Gellish by (numeric) unique identifiers (Gellish UID's). This enables software to translate expressions from one formal natural language to any other formal natural languages.

Overview

Gellish is intended for the expression of facts (statements), queries, answers, etc. For example, for the complete and unambiguous specification of business processes, products, facilities and physical processes; for information about their purchasing, fabrication, installation, operation and maintenance; and for the exchange of such information between systems, although in a system-independent, computer-interpretable and language-independent way. It is also intended for the expression of knowledge and requirements about such things.

The definition of Gellish can be derived from the definition of Gellish Formal English by considering 'expressions' as relations between the Unique Identifiers only. The definition of Gellish Formal English is provided in the Gellish English Dictionary-Taxonomy, which is a large 'smart dictionary' of concepts with relations between those concepts (earlier it was called STEPlib). The Dictionary-Taxonomy is called a 'smart dictionary', because the concepts are arranged in a subtype-supertype hierarchy, making it a taxonomy that supports inheritance of properties from supertype concepts to subtype concepts. Furthermore, because together with other relations between the concepts, the smart dictionary is extended into an ontology. Gellish has basically an extended object-relation-object structure to express facts by relations, whereas each fact may be accompanied by a number of auxiliary facts about the main fact. Examples of auxiliary facts are author, date, status, etc. To enable an unambiguous interpretation, Gellish includes the definition of a large number (more than 650) of standard relation types that determine the rich semantic expression capability of the language.

In principle, for every natural language there is a Gellish variant that is specific for that language. For example, Gellish Dutch (Gellish Nederlands), Gellish Italian, Gellish English, Gellish Russian, etc. Gellish does not invent its own terminology, such as Esperanto, but uses the terms from natural languages. Thus, the Gellish English dictionary-taxonomy is like an (electronic) ordinary dictionary that is extended with additional concepts and with relations between the concepts.

For example, the Gellish dictionary-taxonomy contains definitions of many concepts that also appear in ordinary dictionaries, such as kinds of physical objects like building, airplane, car, pump, pipe, properties such as mass and color, scales such as kg and bar, as well as activities and processes, such as repairing and heating, etc. In addition to that, the dictionary contains concepts with composed names, such as 'hairpin heat exchanger', which will not appear in ordinary dictionaries. The main difference with ordinary dictionaries is that the Gellish dictionary also includes definitions of standard kinds of relations (relation types), which are denoted by standard Gellish English phrases. For example, it defines relation types such as is a subtype of, is classified as a, has as aspect, is quantified as, can be a performer of a, shall have as part a, etc. Such standard relation types and concept definitions enable a Gellish-powered software to correctly and unambiguously interpret Gellish expressions.

Gellish expressions may be expressed in any suitable format, such as SQL or RDF or OWL or even in the form of spreadsheet tables, provided that their content is equivalent to the tabular form of Gellish Naming Tables (which define the vocabulary) and Fact Tables (together defining a Gellish Database content) or equivalent to Gellish Message Tables (for data exchange). An example of the core of a Message Table is the following:

Left-hand termRelation typeRight-hand term
centrifugal pumpis a subtype ofpump
P-123is classified as acentrifugal pump
P-123has as aspectthe mass of P-123
the mass of P-123is classified as amass
the mass of P-123is qualified as50 kg

A full Gellish Message Table requires additional columns for unique identifiers, the intention of the expression, the language of the expression, cardinalities, unit of measure, the validity context, status, creation date, author, references, and various other columns. Gellish Light only requires the three above columns, but then it does not support, for example, capabilities to distinguish homonyms; automated translation; and version management, etc. Those capabilities and several others are supported by Full Gellish. The following example illustrates the use of some additional columns in a Gellish Message Table, where UoM stands for 'unit of measure'.

Fact UIDIntentionLeft UIDLeft termRelat. UIDRelation typeRight UIDRight termUID of UoMUoMStatus
201statement130058centrifugal pump1146is a subtype of130206pumpaccepted
202statement102P-1231225is classified as a130058centrifugal pumpproposed
203statement102P-1231727has as aspect103mass of P-123proposed
204statement103mass of P-1231225is classified as a550020massproposed
205statement103mass of P-1235020is qualified as92030350570039kgproposed

The collection of standard relation types define the kinds of facts that can be expressed in Gellish, although anybody can create their own proprietary extension of the dictionary and thus can add concepts and relation types as and when required.

As Gellish is a formal language, any Gellish expression may only use concepts that are defined in a Gellish dictionary, or the definition of any concept should be ad hoc within the collection of Gellish expressions. Knowledge bases can be created by using the Gellish language and its concept definitions in a Gellish Dictionary. Example applications of a Gellish dictionary are usage as a source of classes for classification of equipment, documents, etc., or as standard terminology (metadata) or to harmonize data in various computer systems, or as a thesaurus or taxonomy in a search engine.

Gellish enables automatic translation, and enables the use of synonyms, abbreviations and codes as well as homonyms, due to the use of a unique natural language independent identifier (UID) for every concept. For example, 130206 (pump) and 1225 (is classified as a). This ensures that concepts are identified in a natural language independent way. Therefore, various Gellish Dictionaries use the same UID's for the same concept. This means that those dictionaries provide translations of the names of the objects, as well as a translation of the standard relation types. The UID's enable that information and knowledge that is expressed in one language variant of Gellish can be automatically translated and presented by Gellish-powered software in any other language variant for which a Gellish dictionary is available. For example, the phrase is classified as a and the phrase ist klassifiziert als are denotations of the same UID 1225.

For example, a computer can automatically express the second line in the above example in German as follows:

Left-hand termRelation typeRight-hand term
P-123ist klassifiziert alsZentrifugalpumpe

Questions (queries) can be expressed as well. Queries are facilitated through standardized terms such as what, which, where and when. They can be used in combination with reserved UID's for unknowns in the range 1-100. This enables Gellish expressions for queries, such as:

- query: what <is located in> Paris

Gellish-powered software should be able to provide the correct answer to this query by comparing the expression with the facts in the database, and should respond with:

- answer: The Eiffel Tower <is located in> Paris

Note that the automatic translation capability implies that a query/question that is expressed in a particular language, say English, can be used to search in a Gellish database in another language (say Chinese), whereas the answer can be presented in English.

Information models in Gellish

Information models can be distinguished in two main categories:

All these categories of models can include drawings and other documents as well as 3D shape information (the core of 3D models). They all can be expressed and integrated in Gellish.

The classification relation between individual things and kinds of things makes the definitions, knowledge and requirements about kinds of things available for the individual things. Furthermore, the subtype-supertype hierarchy in a Gellish Dictionary-Taxonomy implies that the knowledge and requirements that are specified for a kind of thing are inherited by all their subtypes. As a consequence, when somebody designs an individual item and classifies it by a particular kind, then all the knowledge and requirements that are known for the supertypes of that kind will also be recognized and can be made available automatically.

Each category of information model requires its own semantics, because the expression of the individual fact that something real is the case requires other kinds of relations than the expression of the general fact that something can be the case, which again differs from a fact that expresses that something shall be the case in a particular context or that something is by definition always the case. These semantic differences cause that the various categories of information models require their own subsets of standard relation types. Therefore Gellish makes a distinction between the following categories of relation types:

Gellish databases and data exchange messages

Gellish is typically expressed in the form of Gellish Data Tables. There are three categories of Data Tables:

A Gellish Database typically consists of one or more Naming Tables and one or more Fact Tables together. Data Tables and Fact Tables are one-to-one equivalent to Message Tables.

All table columns are standardised, so that each Gellish data table of a category contains the same standard columns, or of a subset of the standard ones. This provides standard interfaces for exchange of data between application systems. The content of data tables may also include constraints and requirements (data models) that specify the kind of data that should and may be provided for particular applications. Such requirements models make dedicated database designs superfluous. The Gellish Data Tables can be used as part of a central database or can form distributed databases, but tables can also be exchanged in data exchange files or as body of Gellish Messages.

A Naming Table relates terms in a language and language community ('speech community') to a unique identifier. This enables the unambiguous use of synonyms, abbreviations and codes as well as homonyms in multiple languages. The following table is an example of a Naming Table:

UID of languageUID of language communityUID for termInverse indicatorTerm (name)Comment
9100361932631302060pumpEnglish engineering term for concept 130206
9100381932631302060PumpeGerman
9100371932631302060pompDutch

The inverse indicator is only relevant when phrases are used to denote relation types, because each standard relation type is denoted by at least one standard phrase as well as at least one standard inverse phrase. For example, the phrase <is a part of> has as inverse phrase <has as part>. Both phrases denote the same kind of relation (a composition relation). However, when the inverse phrase is used to express a fact, then the left hand and right hand objects in the expression should have an inverse position. Thus, the following expressions will be recognized as two equally valid expressions of the same fact (with the same Fact UID):

So, the inverse indicator indicates for relation types whether as phrase is a base phrase (1) or an inverse phrase (2).

A Fact Table contains expressions of any facts, each of which is accompanied by a number of auxiliary facts that provide additional information relevant for the main facts. Examples of auxiliary facts are: the intention, status, author, creation date, etc.

A Gellish Fact Table consists of columns for the main fact and a number of columns for auxiliary facts. The auxiliary facts enable to specify things such as roles, cardinalities, validity contexts, units of measure, date of latest change, author, references, etcetera.:

The columns for the main fact in a Fact Table are:

These columns also appear in a Message Table as shown below.

A full Gellish Message table is in fact a combination of a Naming Table and a Fact Table. It contains not only columns for the expression of facts, but also columns for the names of the related objects and the additional columns to express auxiliary facts. This enables the use of a single table, also for the specification and use of synonyms and homonyms, multiple languages, etcetera. The core of a Message Table is illustrated in the following table:

LanguageUID of left-hand objectName of left-hand objectUID of factUID of relation typeName of relation typeUID of right-hand objectName of right-hand objectStatus
English101The Eiffel tower2015138is located in102Parisaccepted
English101The Eiffel tower2021225is classified as a40903toweraccepted
English102Paris2031225is classified as a700008cityaccepted

In the above example, the concepts with the names, as well as the (standard) relation types are selected with their UID's from the Gellish English Dictionary.

A Gellish Database table can be implemented in any tabular format. For example, it can be implemented as a SQL-based database or otherwise, as a STEPfile (according to ISO 10303-21), or as a simple spreadsheet table, as in Excel, such as the Gellish Dictionary itself.

Gellish database tables can also be described in an equivalent form using RDF/Notation3 or XML. A representation of “Gellish in XML” is defined in a standard XML Schema. An XML file with data according to that XML Schema is recommended to have as file extension GML, whereas GMZ stands for “Gellish in XML zipped”.

One of the differences between Gellish and RDF, XML or OWL is that Gellish English includes an extensive English Dictionary of concepts, including also a large (and extendable) set of standard relation types to make computer-interpretable expressions (in a form that is also readable for non-IT professionals). On the other hand, 'languages' such as RDF, XML and OWL only define a few basic concepts, which leaves much freedom for their users to define their own 'domain language' concepts.

This attractive freedom has the disadvantage that users of 'languages' such as RDF, XML or OWL still don't use a common language and still cannot integrate data that stem from different sources. Gellish is designed to provide a real common language, at least to a much larger extent and therefore provides much more standardization and commonality in terminology and expressions.

Gellish compared with OWL

OWL (Web Ontology Language/Ontological Web Language) and Gellish are both meant for use on the semantic web. Gellish can be used in combination with OWL, or on its own. There are many similarities between the two languages, such as the use of unique identifiers (Gellish UIDs, OWL URIs) [2] but also important differences. The main differences are as follows:

Target audience and meta level

OWL is a metalanguage, including a basic grammar, but without a dictionary. OWL is meant to be used by computer system developers and ontology developers to create ontologies. Gellish is a language that includes a grammar as well as a dictionary-taxonomy and ontology. Gellish is meant to be used by computer system developers as well as by end-users and can also be used by ontology developers when they want to extend the Gellish ontology or build their own domain ontology. Gellish does not make a distinction between a meta-language and a user language; the concepts from both 'worlds' are integrated in one language. So, the Gellish English dictionary contains concepts that are equivalent to the OWL concepts, but also contains the concepts from an ordinary English dictionary.

Vocabularies and ontologies

OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. In other words, it can be used for the definition of taxonomies or ontologies. The terms in such a vocabulary do not become part of the OWL language. So OWL does not include definitions of the terms in a natural language, such as road, car, bolt or length. However, it can be used to define them and to build an ontology.

The upper ontology part of Gellish can also be used to define terms and the relations between them. However, many of such natural language terms are already defined in the lower part of the Gellish dictionary-taxonomy itself. So in Gellish, terms such as road, car, bolt or length are part of the Gellish language. Therefore, Gellish English is a subset of natural English.

Synonyms and multi-language capabilities

Gellish makes a distinction between concepts and the various terms that are used as names (synonyms, abbreviations and translations) to refer to those concepts in different contexts and languages. Every concept is identified by a unique identifier that is natural-language-independent and can have many different terms in different languages to denote the concept. This enables automatic translation between different natural language versions of Gellish. In OWL, the various terms in different languages and the synonyms are in principle different concepts that need to be declared to be the same by explicit equivalence relations (unless the alternatives are expressed in terms of the alternative label annotation properties). [3] On one hand, the OWL approach is simpler but it makes expressions ambiguous and makes data integration and automated translation significantly more complicated.

Upper ontology

OWL can be regarded as an upper ontology that consists of 54 'language constructs' (constructors or concepts). [4] The upper ontology part of Gellish currently consists of more than 1500 concepts of which about 650 are standard relation types. In addition to that the Gellish Dictionary-Taxonomy contains more than 40,000 concepts. This indicates the large semantic richness and expression capabilities of Gellish. Furthermore, Gellish contains definitions of many facts about the defined concepts that are expressed as relationships between those concepts.

Extensibility

OWL has a fixed set of concepts (terms) that are only extended when the OWL standard is extended. Gellish is extensible by any user, under Open Source conditions.

History

Gellish is a further development of ISO 10303-221 (AP221) and ISO 15926. Gellish is an integration and extension of the concepts that are defined in both standards. The main difference with both ISO standards is that Gellish is easier to implement and has more (precise) semantic expression capabilities and is suitable to express queries and answers as well. The specific philosophy of spatio-temporal parts that is used in ISO 15926 to represent discrete time periods to represent time can also be used in Gellish, however the recommended representation of time in Gellish is the more intuitive method that specifies that facts have a specified validity duration. For example, each property can have multiple numeric values on a scale, which is expressed as multiple facts, whereas for each of those facts an (optional) specification can be added of the moment or time period during which that fact is valid.

A subset of the Gellish Dictionary (Taxonomy) is used to create ISO 15926-4. Gellish in RDF is being standardized as ISO 15926-11.

Related Research Articles

Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings from psychology about how humans solve problems, and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning.

<span class="mw-page-title-main">Semantic network</span> Knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.

Reification is the process by which an abstract idea about a computer program is turned into an explicit data model or other object created in a programming language. A computable/addressable object—a resource—is created in a system as a proxy for a non computable/addressable object. By means of reification, something that was previously implicit, unexpressed, and possibly inexpressible is explicitly formulated and made available to conceptual manipulation. Informally, reification is often referred to as "making something a first-class citizen" within the scope of a particular system. Some aspect of a system can be reified at language design time, which is related to reflection in programming languages. It can be applied as a stepwise refinement at system design time. Reification is one of the most frequently used techniques of conceptual analysis and knowledge representation.

A modeling language is any artificial language that can be used to express data, information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in the structure Programing language.

<span class="mw-page-title-main">Data modeling</span> Creating a model of the data in a system

Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. It may be applied as part of broader Model-driven engineering (MDE) concept.

<span class="mw-page-title-main">Information model</span>

An information model in software engineering is a representation of concepts and the relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. Typically it specifies relations between kinds of things, but may also include relations with individual things. It can provide sharable, stable, and organized structure of information requirements or knowledge for the domain context.

RDF Schema (Resource Description Framework Schema, variously abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can extract some knowledge from them using a query language, like SPARQL.

In information science, an upper ontology is an ontology that consists of very general terms that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies.

In computer science and artificial intelligence, ontology languages are formal languages used to construct ontologies. They allow the encoding of knowledge about specific domains and often include reasoning rules that support the processing of that knowledge. Ontology languages are usually declarative languages, are almost always generalizations of frame languages, and are commonly based on either first-order logic or on description logic.

The ISO 15926 is a standard for data integration, sharing, exchange, and hand-over between computer systems.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

The Gellish English Dictionary-Taxonomy is an example of an open-source “smart” electronic dictionary, in which concepts are arranged in a subtype-supertype hierarchy, thus forming a taxonomy. The dictionary-taxonomy is machine readable. It is compliant with the guidelines of ISO 16354. Apart from the fact that it is an English (business-technical) dictionary, it also defines the semantics of Gellish English, which is a computer-interpretable structured subset of the natural English language for data storage and data exchange. The dictionary-taxonomy differs from conventional dictionaries because of several additional capabilities. Therefore it is called "smart." This means that it satisfies the following criteria:

Knowledge modeling is a process of creating a computer interpretable model of knowledge or standard specifications about a kind of process and/or about a kind of facility or product. The resulting knowledge model can only be computer interpretable when it is expressed in some knowledge representation language or data structure that enables the knowledge to be interpreted by software and to be stored in a database or data exchange file.
Knowledge-based engineering or knowledge-aided design is a process of computer-aided usage of such knowledge models for the design of products, facilities or processes. The design of products or facilities then uses the knowledge model to guide the creation of the facility or product that need to be designed. In other words, it used knowledge about a kind of object to create a product model of an (imaginary) individual object. Similarly, the design of a particular process implies the creation of a process model, which design activity can be guided by the knowledge that is contained in a knowledge model about such a kind of process. The resulting process model, product model or facility model is typically also stored in a database.

The Semantics of Business Vocabulary and Business Rules (SBVR) is an adopted standard of the Object Management Group (OMG) intended to be the basis for formal and detailed natural language declarative description of a complex entity, such as a business. SBVR is intended to formalize complex compliance rules, such as operational rules for an enterprise, security policy, standard compliance, or regulatory compliance rules. Such formal vocabularies and rules can be interpreted and used by computer systems. SBVR is an integral part of the OMG's model-driven architecture (MDA).

<span class="mw-page-title-main">Semantic data model</span> Database model

A semantic data model (SDM) is a high-level semantics-based database description and structuring formalism for databases. This database model is designed to capture more of the meaning of an application environment than is possible with contemporary database models. An SDM specification describes a database in terms of the kinds of entities that exist in the application environment, the classifications and groupings of those entities, and the structural interconnections among them. SDM provides a collection of high-level modeling primitives to capture the semantics of an application environment. By accommodating derived information in a database structural specification, SDM allows the same information to be viewed in several ways; this makes it possible to directly accommodate the variety of needs and processing requirements typically present in database applications. The design of the present SDM is based on our experience in using a preliminary version of it. SDM is designed to enhance the effectiveness and usability of database systems. An SDM database description can serve as a formal specification and documentation tool for a database; it can provide a basis for supporting a variety of powerful user interface facilities, it can serve as a conceptual database model in the database design process; and, it can be used as the database model for a new kind of database management system.

<span class="mw-page-title-main">IDEF5</span>

IDEF5 is a software engineering method to develop and maintain usable, accurate domain ontologies. This standard is part of the IDEF family of modeling languages in the field of software engineering.

<span class="mw-page-title-main">Generic data model</span>

Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type.

<span class="mw-page-title-main">Ontology engineering</span> Field that studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

Contemporary ontologies share many structural similarities, regardless of the ontology language in which they are expressed. Most ontologies describe individuals (instances), classes (concepts), attributes, and relations.

References

  1. Van Renssen (2005).
  2. RFC 3986 (2005).
  3. Stevens & Lord (2012).
  4. "OWL Web Ontology Language Overview". w3c. Retrieved 5 April 2019.

Bibliography