EXPRESS (data modeling language)

Last updated June 28, 2023

EXPRESS is a standard for generic data modeling language for product data. EXPRESS is formalized in the ISO Standard for the Exchange of Product model STEP (ISO 10303), and standardized as ISO 10303-11.^[1]

Overview

Data models formally define data objects and relationships among data objects for a domain of interest. Some typical applications of data models include supporting the development of databases and enabling the exchange of data for a particular area of interest. Data models are specified in a data modeling language.^[2] EXPRESS is a data modeling language defined in ISO 10303-11, the EXPRESS Language Reference Manual.^[3]

An EXPRESS data model can be defined in two ways, textually and graphically. For formal verification and as input for tools such as SDAI the textual representation within an ASCII file is the most important one. The graphical representation on the other hand is often more suitable for human use such as explanation and tutorials. The graphical representation, called EXPRESS-G, is not able to represent all details that can be formulated in the textual form.

EXPRESS is similar to programming languages such as Pascal. Within a SCHEMA various datatypes can be defined together with structural constraints and algorithmic rules. A main feature of EXPRESS is the possibility to formally validate a population of datatypes - this is to check for all the structural and algorithmic rules.

EXPRESS-G

EXPRESS-G is a standard graphical notation for information models.^[4] It is a companion to the EXPRESS language for displaying entity and type definitions, relationships and cardinality.^[5] This graphical notation supports a subset of the EXPRESS language. One of the advantages of using EXPRESS-G over EXPRESS is that the structure of a data model can be presented in a more understandable manner. A disadvantage of EXPRESS-G is that complex constraints cannot be formally specified. Figure 1 is an example. The data model presented in figure could be used to specify the requirements of a database for an audio compact disc (CD) collection.^[2]

Simple example

A simple EXPRESS data model looks like fig 2, and the code like this:

SCHEMA Family;  ENTITY Person    ABSTRACT SUPERTYPE OF (ONEOF (Male, Female));      name: STRING;      mother: OPTIONAL Female;      father: OPTIONAL Male; END_ENTITY;  ENTITY Female    SUBTYPE OF (Person); END_ENTITY;  ENTITY Male    SUBTYPE of (Person); END_ENTITY;  END_SCHEMA;

The data model is enclosed within the EXPRESS schema Family. It contains a supertype entity Person with the two subtypes Male and Female. Since Person is declared to be ABSTRACT only occurrences of either (ONEOF) the subtype Male or Female can exist. Every occurrence of a person has a mandatory name attribute and optionally attributes mother and father. There is a fixed style of reading for attributes of some entity type:

a Female can play the role of mother for a Person
a Male can play the role of father for a Person

EXPRESS Building blocks

Datatypes

EXPRESS offers a series of datatypes, with specific data type symbols of the EXPRESS-G notation:^[2]

Entity data type: This is the most important datatype in EXPRESS. It is covered below in more detail. Entity datatypes can be related in two ways, in a sub-supertype tree and/or by attributes.
Enumeration data type: Enumeration values are simple strings such as red, green, and blue for an rgb-enumeration. In the case that an enumeration type is declared extensible it can be extended in other schemas.
Defined data type: This further specializes other datatypes—e.g., define a datatype positive that is of type integer with a value > 0.
Select data type: Selects define a choice or an alternative between different options. Most commonly used are selects between different entity_types. More rare are selects that include defined types. In the case that an enumeration type is declared extensible, it can be extended in other schemas.
Simple data type
- String: This is the most often used simple type. EXPRESS strings can be of any length and can contain any character (ISO 10646/Unicode).
- Binary: This data type is only very rarely used. It covers a number of bits (not bytes). For some implementations the size is limited to 32 bit.
- Logical: Similar to the boolean datatype a logical has the possible values TRUE and FALSE and in addition UNKNOWN.
- Boolean: With the boolean values TRUE and FALSE.
- Number: The number data type is a supertype of both, integer and real. Most implementations take uses a double type to represent a real_type, even if the actual value is an integer.
- Integer: EXPRESS integers can have in principle any length, but most implementations restricted them to a signed 32 bit value.
- Real: Ideally an EXPRESS real value is unlimited in accuracy and size. But in practice a real value is represented by a floating point value of type double.
Aggregation data type: The possible kinds of aggregation_types are SET, BAG, LIST and ARRAY. While SET and BAG are unordered, LIST and ARRAY are ordered. A BAG may contain a particular value more than once, this is not allowed for SET. An ARRAY is the only aggregate that may contain unset members. This is not possible for SET, LIST, BAG. The members of an aggregate may be of any other data type.

A few general things are to be mentioned for datatypes.

Constructed datatypes can be defined within an EXPRESS schema. They are mainly used to define entities, and to specify the type of entity attributes and aggregate members.
Datatypes can be used in a recursive way to build up more and more complex data types. E.g. it is possible to define a LIST of an ARRAY of a SELECT of either some entities or other datatypes. If it makes sense to define such datatypes is a different question.
EXPRESS defines a couple of rules how a datatype can be further specialized. This is important for re-declared attributes of entities.
GENERIC data types can be used for procedures, functions and abstract entities.

Entity-Attribute

Entity attributes allow to add "properties" to entities and to relate one entity with another one in a specific role. The name of the attribute specifies the role. Most datatypes can directly serve as type of an attribute. This includes aggregation as well.

There are three different kinds of attributes, explicit, derived and inverse attributes. And all these can be re-declared in a subtype. In addition an explicit attribute can be re-declared as derived in a subtype. No other change of the kind of attributes is possible.

Explicit attributes are those with direct values visible in a STEP-File.
Derived attributes get their values from an expression. In most cases the expression refers to other attributes of THIS instance. The expression may also use EXPRESS functions.
Inverse attributes do not add "information" to an entity, but only name and constrain an explicit attribute to an entity from the other end.

Specific attribute symbols of the EXPRESS-G notation:^[2]

Supertypes and subtypes

An entity can be defined to be a subtype of one or several other entities (multiple inheritance is allowed!). A supertype can have any number of subtypes. It is very common practice in STEP to build very complex sub-supertype graphs. Some graphs relate 100 and more entities with each other.

An entity instance can be constructed for either a single entity (if not abstract) or for a complex combination of entities in such a sub-supertype graph. For the big graphs the number of possible combinations is likely to grow in astronomic ranges. To restrict the possible combinations special supertype constraints got introduced such as ONEOF and TOTALOVER. Furthermore, an entity can be declared to be abstract to enforce that no instance can be constructed of just this entity but only if it contains a non-abstract subtype.

Algorithmic constraints

Entities and defined data types may be further constrained with WHERE rules. WHERE rules are also part of global rules. A WHERE rule is an expression, which must evaluate to TRUE, otherwise a population of an EXPRESS schema, is not valid. Like derived attributes these expression may invoke EXPRESS functions, which may further invoke EXPRESS procedures. The functions and procedures allow formulating complex statements with local variables, parameters and constants - very similar to a programming language.

The EXPRESS language can describe local and global rules. For example:

ENTITYarea_unitSUBTYPEOF(named_unit);WHEREWR1:(SELF\named_unit.dimensions.length_exponent=2)AND(SELF\named_unit.dimensions.mass_exponent=0)AND(SELF\named_unit.dimensions.time_exponent=0)AND(SELF\named_unit.dimensions.electric_current_exponent=0)AND(SELF\named_unit.dimensions.thermodynamic_temperature_exponent=0)AND(SELF\named_unit.dimensions.amount_of_substance_exponent=0)AND(SELF\named_unit.dimensions.luminous_intensity_exponent=0);END_ENTITY;-- area_unit

This example describes that area_unit entity must have square value of length. For this the attribute dimensions.length_exponent must be equal to 2 and all other exponents of basic SI units must be 0.

Another example:

TYPEday_in_week_number=INTEGER;WHEREWR1:(1<=SELF)AND(SELF<=7);END_TYPE;-- day_in_week_number

That is, it means that week value cannot exceed 7.

And so, you can describe some rules to your entities. More details on the given examples can be found in ISO 10303-41

Related Research Articles

A document type definition (DTD) is a set of markup declarations that define a document type for an SGML-family markup language.

In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. Key to GML's utility is its ability to integrate all forms of geographic information, including not only conventional "vector" or discrete objects, but coverages and sensor data.

In programming language theory, subtyping is a form of type polymorphism in which a subtype is a datatype that is related to another datatype by some notion of substitutability, meaning that program elements, typically subroutines or functions, written to operate on elements of the supertype can also operate on elements of the subtype. If S is a subtype of T, the subtyping relation means that any term of type S can safely be used in any context where a term of type T is expected. The precise semantics of subtyping here crucially depends on the particulars of how "safely be used" and "any context" are defined by a given type formalism or programming language. The type system of a programming language essentially defines its own subtyping relation, which may well be trivial, should the language support no conversion mechanisms.

An entity–relationship model describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types and specifies relationships that can exist between entities.

The High Level Architecture (HLA) is a standard for distributed simulation, used when building a simulation for a larger purpose by combining (federating) several simulations. The standard was developed in the 1990s under the leadership of the US Department of Defense and was later transitioned to become an open international IEEE standard. It is a recommended standard within NATO through STANAG 4603. Today the HLA is used in a number of domains including defense and security and civilian applications.

The Industry Foundation Classes (IFC) is a CAD data exchange data schema intended for description of architectural, building and construction industry data.

STEP-file is a widely used data exchange form of STEP. ISO 10303 can represent 3D objects in computer-aided design (CAD) and related information. Due to its ASCII structure, a STEP-file is easy to read, with typically one instance per line. The format of a STEP-file is defined in ISO 10303-21 Clear Text Encoding of the Exchange Structure.

ISO 10303-22 is a part of the implementation methods of STEP with the official title Standard data access interface or simply SDAI.

For product and manufacturing information, STEP-XML is a short term for ISO 10303-28, Industrial automation systems and integration—Product data representation and exchange—Part 28: Implementation methods: XML representations of EXPRESS schema and data. STEP-XML specifies the use of the Extensible Markup Language (XML) to represent EXPRESS schema and the data that is governed by those EXPRESS schema. It is an alternative method to STEP-File for the exchange of data according to ISO 10303.

An information model in software engineering is a representation of concepts and the relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. Typically it specifies relations between kinds of things, but may also include relations with individual things. It can provide sharable, stable, and organized structure of information requirements or knowledge for the domain context.

A representation term is a word, or a combination of words, that semantically represent the data type of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.

Integration DEFinition for information modeling (IDEF1X) is a data modeling language for the development of semantic data models. IDEF1X is used to produce a graphical information model which represents the structure and semantics of information within an environment or system.

Gellish is an ontology language for data storage and communication, designed and developed by Andries van Renssen since mid-1990s. It started out as an engineering modeling language but evolved into a universal and extendable conceptual data modeling language with general applications. Because it includes domain-specific terminology and definitions, it is also a semantic data modelling language and the Gellish modeling methodology is a member of the family of semantic modeling methodologies.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type.

References

This article incorporates public domain material from the National Institute of Standards and Technology.

↑ ISO 10303-11:2004 Industrial automation systems and integration -- Product data representation and exchange -- Part 11: Description methods: The EXPRESS language reference manual
1 2 3 4 Michael R. McCaleb (1999). "A Conceptual Data Model of Datum Systems". National Institute of Standards and Technology. August 1999.
↑ ISO International Standard 10303-11:1994, Industrial automation systems and integration — Product data representation andexchange — Part 11: Description methods: The EXPRESS language reference manual, International Organization for Standardization, Geneva, Switzerland (1994).
↑ 4 EXPRESS-G Language Overview Archived 2008-11-09 at the Wayback Machine . Accessed 9 Nov 2008.
↑ For information on the EXPRESS-G notation, consult Annex B of the EXPRESS Language Reference Manual (ISO 10303-11)

v t e ISO standards by standard number
List of ISO standards – ISO romanizations – IEC standards
1–9999	1 2 3 4 6 7 9 16 17 31 -0 -1 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 68-1 128 216 217 226 228 233 259 261 262 302 306 361 500 518 519 639 -1 -2 -3 -5 -6 646 657 668 690 704 732 764 838 843 860 898 965 999 1000 1004 1007 1073-1 1073-2 1155 1413 1538 1629 1745 1989 2014 2015 2022 2033 2047 2108 2145 2146 2240 2281 2533 2709 2711 2720 2788 2848 2852 3029 3103 3166 -1 -2 -3 3297 3307 3601 3602 3864 3901 3950 3977 4031 4157 4165 4217 4909 5218 5426 5427 5428 5725 5775 5776 5800 5807 5964 6166 6344 6346 6385 6425 6429 6438 6523 6709 6943 7001 7002 7010 7027 7064 7098 7185 7200 7498 -1 7637 7736 7810 7811 7812 7813 7816 7942 8000 8093 8178 8217 8373 8501-1 8571 8583 8601 8613 8632 8651 8652 8691 8805/8806 8807 8820-5 8859 -1 -2 -3 -4 -5 -6 -7 -8 -8-I -9 -10 -11 -12 -13 -14 -15 -16 8879 9000/9001 9036 9075 9126 9141 9227 9241 9293 9314 9362 9407 9496 9506 9529 9564 9592/9593 9594 9660 9797-1 9897 9899 9945 9984 9985 9995
10000–19999	10006 10007 10116 10118-3 10160 10161 10165 10179 10206 10218 10303 -11 -21 -22 -28 -238 10383 10487 10585 10589 10628 10646 10664 10746 10861 10957 10962 10967 11073 11170 11179 11404 11544 11783 11784 11785 11801 11889 11898 11940 (-2) 11941 11941 (TR) 11992 12006 12182 12207 12234-2 12620 13211 -1 -2 13216 13250 13399 13406-2 13450 13485 13490 13567 13568 13584 13616 13816 14000 14031 14224 14289 14396 14443 14496 -2 -3 -6 -10 -11 -12 -14 -17 -20 14617 14644 14649 14651 14698 14764 14882 14971 15022 15189 15288 15291 15292 15398 15408 15444 -3 15445 15438 15504 15511 15686 15693 15706 -2 15707 15897 15919 15924 15926 15926 WIP 15930 16023 16262 16355-1 16485 16612-2 16750 16949 (TS) 17024 17025 17100 17203 17369 17442 17506 17799 18004 18014 18245 18629 18916 19005 19011 19092 -1 -2 19114 19115 19125 19136 19407 19439 19500 19501 19502 19503 19505 19506 19507 19508 19509 19510 19600 19752 19757 19770 19775-1 19794-5 19831
20000–29999	20000 20022 20121 20400 20802 21000 21047 21122 21500 21827 22000 22300 22395 23090-3 23270 23271 23360 24517 24613 24617 24707 25178 25964 26000 26262 26300 26324 27000 series 27000 27001 27002 27005 27006 27729 28000 29110 29148 29199-2 29500
30000+	30170 31000 32000 37001 38500 40500 42010 45001 50001 55000 56000 80000
Category