Representation term

Last updated

A representation term is a word, or a combination of words, that semantically represent the data type (value domain) of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.

Contents

A Representation Term may be thought of as an attribute of a data element in a metadata registry that classifies the data element according to the type of data stored in the data element.

Representation terms are typically "approved" by the organization or standards body using them. For example, the UN publishes its approved list as part of the UN/CEFACT Core Components Technical Specification. The Universal Data Element Framework uses a subset of CCTS representation terms and assigns numeric codes to those used.

Use cases for representation term

Managing value domains

A value domain expresses the set of allowed values for a data element. The representation term (and typically the corresponding data type term) comprise a taxonomy for the value domains within a data set. This taxonomy is the representation class. Thus the representation term can be used to control proliferation of value domains by ensuring equivalent value domains use the same representation term.

Finding equivalent properties

When a person or software agent is analyzing two separate metadata registries to find property equivalence, the Representation Term can be used as a guide. For example, if system A has a Data Element such as PersonGenderCode and system B has a data element such as PersonSexCode the code suffix might assist the two systems to only match data elements that have the suffix "Code". However, a taxonomy of property terms (i.e. "Sex" or "Gender") is much more efficient in this respect.

Inference

The Representation Term can be used in many ways to do inferences on data sets. Representation Terms tells the observer of any data stream about the data types and gives an indication of how the Data Element can be used. This is critical when mapping metadata registries to external Data Elements. For example, if you are sent a record about a person you may look for any "ID" suffix to understand how the remote system may differentiate two distinct records.

Required fields

Representation Terms are also used to make inferences about the requirements of a property. For example, if a data stream had Data Element PersonBirthDateAndTime you would know that BOTH the date AND time are available and relevant, not just the date. If the birth time was optional, a separate data elements should be used such as PersonBirthDate and PersonBirthTime.

Finding data warehouse dimensions and measures

When creating a data warehouse, a business analyst looks at the Representation Terms to quickly find the dimensions and measures of a subject matter in order to build OLAP cubes. For example:

  1. Indicator or Code are used to create data warehouse dimensions
  2. Date or DateTime are used to relate to the time dimension, which are frequently shared between cubes using conformed dimensions
  3. Amount, Number, Measure or Value terms (which can be added together) are candidates for a measurement
  4. Name and Text are used for screen labels or other descriptive elements
  5. Percent needs to be analyzed since they can't really be added together with clear meaning
  6. ID is used to remove duplicate records

Core Components Technical Specification

The joint ISO/UN Core Components Technical Specification formally define both the allowed set of representation terms and the corresponding set of data types. ISO 15000-5 is an implementation layer of ISO 11179 and normatively expresses a set of rules to semantically define conceptual and physical/logical data models for a wide variety of uses. In ISO 15000-5, the representation term provides a mechanism to harmonize the value domains of candidate data elements before being added to the overall data model(s). ISO 15000-5 is being used by a number of government, standards development organizations, and private sector as the basis for data modeling.

Universal Data Element Framework

Some informal standards such as the Universal Data Element Framework (which refer to a Representation Term as a "Property Word") assign unique integer IDs to each Representation Term. This allows metadata mapping tools to map one set of data elements into other metadata vocabularies. An example of these mappings can be found at Property word ID. Note that as of November 2005 the UDEF concepts have not been widely adopted.

Example of representation terms as an XML suffix

For example, if an XML Data fragment had the following:

<Person>  <PersonID>123-45-6789</PersonID>  <PersonGivenName>John</PersonGivenName>  <PersonFamilyName>Smith</PersonFamilyName>  <PersonBirthDate>1990-08-14</PersonBirthDate> </Person>

In the example above, the Representation terms are "ID" for the <PersonID>, the suffix "Name" for the Given and Family names, and "Date" for the <PersonBirthDate>.

Sample representation terms

The following are samples of Representation Terms that have been used for the exchange of electronic messages in systems such as NIEM or GJXDM 3.0: [note: the restrictions expressed here are limited to those specifications and do not represent universal consensus]

Sample Representation Terms
TermUsage
AmountMonetary value with units of currency.
BinaryObject Set of finite-length sequences of binary octets used to represent sound, images and other structures.
Code An enumerated list of all allowable values. Each enumerated value is a string that for brevity represents a specific meaning. For example, for a PersonGenderCode the valid values might be "male", "female" or "unknown".
Date An ISO 8601 date usually in the format YYYY-MM-DD
DateTimeAn ISO 8601 date (in the format YYYY-MM-DD) AND time structure. Note: Do not use unless BOTH the date AND time are REQUIRED fields. If one OR the other is optional always specify the data elements as separate date and time elements.
GraphicUsed to store images. Secondary to Binary Object.
ID Abbreviation for Identifier
Identifier A language-independent label, sign or token used to establish identity of, and uniquely distinguish one instance of an object within an identification scheme.
Indicator Boolean, exactly two mutually exclusive values (true or false). A precise definition must be given for the meaning of a true value.
Measure Numeric value determined by measurement with units. Typically used with items such as height or weight. if the unit of measure is not clear it should be specified.
Name A textual label used as identification of an object. A name is usually meaningful in some language, and is the primary means of identification of objects for humans. Unlike an identifier, a name is not necessarily unique.
Number Assigned or determined by calculation.
Text Character string generally in the form of words.
Time An ISO 8601 time structure.
Value A type of Numeric.
Percent A type of Numeric that traditionally is the results of a ratio calculation that ranges from values of 0 to 1 for values of 0% to 100%.
Quantity Non-monetary numeric value or count with units.
Rate A type of Numeric
Year An ISO 8601 Year

Pros of representation terms

Cons of representation terms

Standards that use representation terms

[Note] This is an extremely limited set of the wide range of standards that specify the use of representation terms.

See also

Notes

  1. ^ ISO/IEC 11179-5 3.11 (238K zip file)
  2. ^ In ISO/IEC 11179-3:2003 5.4 (546K zip file) it is actually representation class which is specified as an attribute of a data element.

Related Research Articles

In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:

  1. An identification such as a data element name
  2. A clear data element definition
  3. One or more representation terms
  4. Optional enumerated values Code (metadata)
  5. A list of synonyms to data elements in other metadata registries Synonym ring
Identifier Name used to identify either a unique object or a unique class of objects

An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical countable object, or physical noncountable substance. The abbreviation ID often refers to identity, identification, or an identifier. An identifier may be a word, number, letter, symbol, or any combination of those.

In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks, including:

The Global Justice XML Data Model is a data reference model for the exchange of information within the justice and public safety communities. The Global JXDM is a product of the Global Justice Information Sharing Initiative's (Global) Infrastructure and Standards Working Group (ISWG), and was developed by the Global ISWG's XML Structure Task Force (XSTF).

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

The ISO/IEC 11179 Metadata Registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandable and shareable.

NIEM originated as an XML-based information exchange framework from the United States. NIEM also supports NIEM JSON exchanges. NIEM is currently developing the NIEM Metamodel and Common Model Format which can be expressed in any data serialization that NIEM supports. NIEM represents a collaborative partnership of agencies and organizations across all levels of government and with private industry. The purpose of this partnership is to effectively and efficiently share critical information at key decision points throughout the whole of the justice, public safety, emergency and disaster management, intelligence, and homeland security enterprise. NIEM is designed to develop, disseminate, and support enterprise-wide information exchange standards and processes that will enable jurisdictions to automate information sharing.

A data element name is a name given to a data element in, for example, a data dictionary or metadata registry. In a formal data dictionary, there is often a requirement that no two data elements may have the same name, to allow the data element name to become an identifier, though some data dictionaries may provide ways to qualify the name in some way, for example by the application system or other context in which it occurs.

In metadata, a data element definition is a human readable phrase or sentence associated with a data element within a data dictionary that describes the meaning or semantics of a data element.

Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system.

A representation term is a word, or a combination of words, used as part of a data element name. Representation class is sometimes used as a synonym for representation term.

Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes.

In metadata, an indicator is a Boolean value that may contain only the values true or false. The definition of an Indicator must include the meaning of a true value and should also include the meaning if the value is false.

In metadata, the term date is a representation term used to specify a calendar date in the Gregorian calendar. Many data representation standards such as XML, XML Schema, Web Ontology Language specify that ISO date format ISO 8601 should be used.

In information science and ontology, a classification scheme is the product of arranging things into kinds of things (classes) or into groups of classes; this bears similarity to categorization, but with perhaps a more theoretical bent, as classification can be applied over a wide semantic spectrum.

In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group of terms can be considered equivalent, metadata registries store the synonyms at a central location called the preferred data element.

Under some metadata standards, time is a representation term used to specify a time of day in the ISO 8601 time format.

Metadata Data about data

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulae, graphics, bibliographies etc.