EXPRESS (data modeling language)

Last updated
Fig 1. Requirements of a database for an audio compact disc (CD) collection, presented in EXPRESS-G notation. A 01 Audio compact disc collection.svg
Fig 1. Requirements of a database for an audio compact disc (CD) collection, presented in EXPRESS-G notation.

EXPRESS is a standard for generic data modeling language for product data. EXPRESS is formalized in the ISO Standard for the Exchange of Product model STEP (ISO 10303), and standardized as ISO 10303-11. [1]

Contents

Overview

Data models formally define data objects and relationships among data objects for a domain of interest. Some typical applications of data models include supporting the development of databases and enabling the exchange of data for a particular area of interest. Data models are specified in a data modeling language. [2] EXPRESS is a data modeling language defined in ISO 10303-11, the EXPRESS Language Reference Manual. [3]

An EXPRESS data model can be defined in two ways, textually and graphically. For formal verification and as input for tools such as SDAI the textual representation within an ASCII file is the most important one. The graphical representation on the other hand is often more suitable for human use such as explanation and tutorials. The graphical representation, called EXPRESS-G, is not able to represent all details that can be formulated in the textual form.

EXPRESS is similar to programming languages such as Pascal. Within a SCHEMA various datatypes can be defined together with structural constraints and algorithmic rules. A main feature of EXPRESS is the possibility to formally validate a population of datatypes - this is to check for all the structural and algorithmic rules.

EXPRESS-G

EXPRESS-G is a standard graphical notation for information models. [4] It is a companion to the EXPRESS language for displaying entity and type definitions, relationships and cardinality. [5] This graphical notation supports a subset of the EXPRESS language. One of the advantages of using EXPRESS-G over EXPRESS is that the structure of a data model can be presented in a more understandable manner. A disadvantage of EXPRESS-G is that complex constraints cannot be formally specified. Figure 1 is an example. The data model presented in figure could be used to specify the requirements of a database for an audio compact disc (CD) collection. [2]

Simple example

Fig 2. An EXPRESS-G diagram for Family schema EXPRESS-G diagram for Family schema.svg
Fig 2. An EXPRESS-G diagram for Family schema

A simple EXPRESS data model looks like fig 2, and the code like this:

SCHEMA Family;  ENTITY Person    ABSTRACT SUPERTYPE OF (ONEOF (Male, Female));      name: STRING;      mother: OPTIONAL Female;      father: OPTIONAL Male; END_ENTITY;  ENTITY Female    SUBTYPE OF (Person); END_ENTITY;  ENTITY Male    SUBTYPE of (Person); END_ENTITY;  END_SCHEMA;

The data model is enclosed within the EXPRESS schema Family. It contains a supertype entity Person with the two subtypes Male and Female. Since Person is declared to be ABSTRACT only occurrences of either (ONEOF) the subtype Male or Female can exist. Every occurrence of a person has a mandatory name attribute and optionally attributes mother and father. There is a fixed style of reading for attributes of some entity type:

EXPRESS Building blocks

Datatypes

EXPRESS offers a series of datatypes, with specific data type symbols of the EXPRESS-G notation: [2]

A 02A Data type symbols.svg

A few general things are to be mentioned for datatypes.

Entity-Attribute

Entity attributes allow to add "properties" to entities and to relate one entity with another one in a specific role. The name of the attribute specifies the role. Most datatypes can directly serve as type of an attribute. This includes aggregation as well.

There are three different kinds of attributes, explicit, derived and inverse attributes. And all these can be re-declared in a subtype. In addition an explicit attribute can be re-declared as derived in a subtype. No other change of the kind of attributes is possible.

Specific attribute symbols of the EXPRESS-G notation: [2]

A 02B Attribute symbols.svg

Supertypes and subtypes

An entity can be defined to be a subtype of one or several other entities (multiple inheritance is allowed!). A supertype can have any number of subtypes. It is very common practice in STEP to build very complex sub-supertype graphs. Some graphs relate 100 and more entities with each other.

An entity instance can be constructed for either a single entity (if not abstract) or for a complex combination of entities in such a sub-supertype graph. For the big graphs the number of possible combinations is likely to grow in astronomic ranges. To restrict the possible combinations special supertype constraints got introduced such as ONEOF and TOTALOVER. Furthermore, an entity can be declared to be abstract to enforce that no instance can be constructed of just this entity but only if it contains a non-abstract subtype.

Algorithmic constraints

Entities and defined data types may be further constrained with WHERE rules. WHERE rules are also part of global rules. A WHERE rule is an expression, which must evaluate to TRUE, otherwise a population of an EXPRESS schema, is not valid. Like derived attributes these expression may invoke EXPRESS functions, which may further invoke EXPRESS procedures. The functions and procedures allow formulating complex statements with local variables, parameters and constants - very similar to a programming language.

The EXPRESS language can describe local and global rules. For example:

ENTITYarea_unitSUBTYPEOF(named_unit);WHEREWR1:(SELF\named_unit.dimensions.length_exponent=2)AND(SELF\named_unit.dimensions.mass_exponent=0)AND(SELF\named_unit.dimensions.time_exponent=0)AND(SELF\named_unit.dimensions.electric_current_exponent=0)AND(SELF\named_unit.dimensions.thermodynamic_temperature_exponent=0)AND(SELF\named_unit.dimensions.amount_of_substance_exponent=0)AND(SELF\named_unit.dimensions.luminous_intensity_exponent=0);END_ENTITY;-- area_unit

This example describes that area_unit entity must have square value of length. For this the attribute dimensions.length_exponent must be equal to 2 and all other exponents of basic SI units must be 0.

Another example:

TYPEday_in_week_number=INTEGER;WHEREWR1:(1<=SELF)AND(SELF<=7);END_TYPE;-- day_in_week_number

That is, it means that week value cannot exceed 7.

And so, you can describe some rules to your entities. More details on the given examples can be found in ISO 10303-41

See also

ISO related subjects
Other related subjects

Related Research Articles

A document type definition (DTD) is a set of markup declarations that define a document type for an SGML-family markup language.

In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

<span class="mw-page-title-main">Geography Markup Language</span> XML grammar for geographical features

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. Key to GML's utility is its ability to integrate all forms of geographic information, including not only conventional "vector" or discrete objects, but coverages and sensor data.

In programming language theory, subtyping is a form of type polymorphism in which a subtype is a datatype that is related to another datatype by some notion of substitutability, meaning that program elements, typically subroutines or functions, written to operate on elements of the supertype can also operate on elements of the subtype. If S is a subtype of T, the subtyping relation means that any term of type S can safely be used in any context where a term of type T is expected. The precise semantics of subtyping here crucially depends on the particulars of how "safely be used" and "any context" are defined by a given type formalism or programming language. The type system of a programming language essentially defines its own subtyping relation, which may well be trivial, should the language support no conversion mechanisms.

<span class="mw-page-title-main">Entity–relationship model</span> Model or diagram describing interrelated things

An entity–relationship model describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types and specifies relationships that can exist between entities.

The High Level Architecture (HLA) is a standard for distributed simulation, used when building a simulation for a larger purpose by combining (federating) several simulations. The standard was developed in the 1990s under the leadership of the US Department of Defense and was later transitioned to become an open international IEEE standard. It is a recommended standard within NATO through STANAG 4603. Today the HLA is used in a number of domains including defense and security and civilian applications.

The Industry Foundation Classes (IFC) is a CAD data exchange data schema intended for description of architectural, building and construction industry data.

STEP-file is a widely used data exchange form of STEP. ISO 10303 can represent 3D objects in computer-aided design (CAD) and related information. Due to its ASCII structure, a STEP-file is easy to read, with typically one instance per line. The format of a STEP-file is defined in ISO 10303-21 Clear Text Encoding of the Exchange Structure.

ISO 10303-22 is a part of the implementation methods of STEP with the official title Standard data access interface or simply SDAI.

For product and manufacturing information, STEP-XML is a short term for ISO 10303-28, Industrial automation systems and integration—Product data representation and exchange—Part 28: Implementation methods: XML representations of EXPRESS schema and data. STEP-XML specifies the use of the Extensible Markup Language (XML) to represent EXPRESS schema and the data that is governed by those EXPRESS schema. It is an alternative method to STEP-File for the exchange of data according to ISO 10303.

<span class="mw-page-title-main">Information model</span>

An information model in software engineering is a representation of concepts and the relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. Typically it specifies relations between kinds of things, but may also include relations with individual things. It can provide sharable, stable, and organized structure of information requirements or knowledge for the domain context.

A representation term is a word, or a combination of words, that semantically represent the data type of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.

<span class="mw-page-title-main">IDEF1X</span>

Integration DEFinition for information modeling (IDEF1X) is a data modeling language for the development of semantic data models. IDEF1X is used to produce a graphical information model which represents the structure and semantics of information within an environment or system.

Gellish is an ontology language for data storage and communication, designed and developed by Andries van Renssen since mid-1990s. It started out as an engineering modeling language but evolved into a universal and extendable conceptual data modeling language with general applications. Because it includes domain-specific terminology and definitions, it is also a semantic data modelling language and the Gellish modeling methodology is a member of the family of semantic modeling methodologies.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

<span class="mw-page-title-main">Generic data model</span>

Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type.

References

PD-icon.svg This article incorporates public domain material from the National Institute of Standards and Technology.

  1. ISO 10303-11:2004 Industrial automation systems and integration -- Product data representation and exchange -- Part 11: Description methods: The EXPRESS language reference manual
  2. 1 2 3 4 Michael R. McCaleb (1999). "A Conceptual Data Model of Datum Systems". National Institute of Standards and Technology. August 1999.
  3. ISO International Standard 10303-11:1994, Industrial automation systems and integration — Product data representation andexchange — Part 11: Description methods: The EXPRESS language reference manual, International Organization for Standardization, Geneva, Switzerland (1994).
  4. 4 EXPRESS-G Language Overview Archived 2008-11-09 at the Wayback Machine . Accessed 9 Nov 2008.
  5. For information on the EXPRESS-G notation, consult Annex B of the EXPRESS Language Reference Manual (ISO 10303-11)

Further reading