Data element definition

Last updated

In metadata, a data element definition is a human readable phrase or sentence associated with a data element within a data dictionary that describes the meaning or semantics of a data element.

Contents

Data element definitions are critical for external users of any data system. Good definitions can dramatically ease the process of mapping one set of data into another set of data. This is a core feature of distributed computing and intelligent agent development.

There are several guidelines that should be followed when creating high-quality data element definitions.

Properties of clear definitions

A good definition is:

  1. Precise - The definition should use words that have a precise meaning. Try to avoid words that have multiple meanings or multiple word senses. The definition should use the shortest description. The definition should not use the term you are trying to define in the definition itself. This is known as a circular definition.
  2. Distinct - The definition should differentiate a data element from other data elements. This process is called disambiguation - The definition should be free of embedded rationale, functional usage, legal metadata registration.

Definitions should not refer to terms or concepts that might be misinterpreted by others or that have different meanings based on the context of a situation. Definitions should not contain acronyms that are not clearly defined or linked to other precise definitions.

If one is creating a large number of data elements, all the definitions should be consistent with related concepts.

Critical Data Element – Not all data elements are of equal importance or value to an organization. A key metadata property of an element is categorizing the data as a Critical Data Element (CDE). This categorization provides focus for data governance and data quality. An organization often has various sub-categories of CDEs, based on use of the data. e.g.:

  1. Security Coverage – data elements that are categorized as personal health record.personal health information or PHI warrant particular attention for security and access
  2. Marketing Department Usage – The marketing department could have a particular set of CDEs identified for identifying Unique Customer or for Campaign Management.
  3. Finance Department Usage – The Finance department could have a different set of CDEs from Marketing. They are focused on data elements which provide measures and metrics for fiscal reporting.

Standards such as the ISO/IEC 11179 Metadata Registry specification give guidelines for creating precise data element definitions. Specifically chapter four of the ISO/IEC 11179 metadata registry standard.

Using precise words

Common words such as play or run database documents over 57 different distinct meanings for the word "play" but only a single definition for the term dramatic play. Fewer definitions in a chosen word's dictionary entry is preferable. This minimizes misinterpretation related to a reader's context and background. The process of finding a good meaning of a word is called Word sense disambiguation

Examples of definitions that could be improved

Here is the definition of "person" data element as defined in the www.w3c.org Friend of a Friend specification *:

  Person: A person.

Although most people do have an intuitive understanding of what a person is, the definition has much room for improvement. The first problem is that the definition is circular. Note that this definition really does not help most readers and needs to be clarified.

Here is the definition of the "Person" Data Element in the Global Justice XML Data Model 3.0 *:

  person: Describes inherent and frequently associated characteristics of a person.

Note that once again the definition is still circular. Person should not reference itself. The definition should use terms other than person to describe what a person is.

Here is a more precise but shorter definition of a person:

  Person: An individual human being.

Note that it uses the word individual to state that this is an instance of a class of things called human being. Technically you might use "homo sapiens" in your definition, but more people are familiar with the term "human being" than "homo sapiens," so commonly used terms, if they are still precise, are always preferred.

Sometimes your system may have cultural norms and assumptions in the definitions. For example, if your "Person" data element tracked characters in a science fiction series that included aliens you may need a more general term other than human being.

  Person: An individual of a sentient species.

See also

Related Research Articles

In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:

  1. An identification such as a data element name
  2. A clear data element definition
  3. One or more representation terms
  4. Optional enumerated values Code (metadata)
  5. A list of synonyms to data elements in other metadata registries Synonym ring

In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks, including:

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

The ISO/IEC 11179 Metadata Registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandable and shareable.

NIEM originated as an XML-based information exchange framework from the United States. NIEM also supports NIEM JSON exchanges. NIEM is currently developing the NIEM Metamodel and Common Model Format which can be expressed in any data serialization that NIEM supports. NIEM represents a collaborative partnership of agencies and organizations across all levels of government and with private industry. The purpose of this partnership is to effectively and efficiently share critical information at key decision points throughout the whole of the justice, public safety, emergency and disaster management, intelligence, and homeland security enterprise. NIEM is designed to develop, disseminate, and support enterprise-wide information exchange standards and processes that will enable jurisdictions to automate information sharing.

A representation term is a word, or a combination of words, that semantically represent the data type of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.

A data element name is a name given to a data element in, for example, a data dictionary or metadata registry. In a formal data dictionary, there is often a requirement that no two data elements may have the same name, to allow the data element name to become an identifier, though some data dictionaries may provide ways to qualify the name in some way, for example by the application system or other context in which it occurs.

The Universal Data Element Framework (UDEF) was a controlled vocabulary developed by The Open Group. It provided a framework for categorizing, naming, and indexing data. It assigned to every item of data a structured alphanumeric tag plus a controlled vocabulary name that describes the meaning of the data. This allowed relating data elements to similar elements defined by other organizations.

Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system.

A representation term is a word, or a combination of words, used as part of a data element name. Representation class is sometimes used as a synonym for representation term.

Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes.

The Extended Metadata Registry (XMDR) is a project proposing and testing a set of extensions to the ISO/IEC 11179 metadata registry specifications that deal with the development of improved standards and technology for storing and retrieving the semantics of data elements, terminologies, and concept structures in metadata registries.

In metadata, an indicator is a Boolean value that may contain only the values true or false. The definition of an Indicator must include the meaning of a true value and should also include the meaning if the value is false.

In information science and ontology, a classification scheme is the product of arranging things into kinds of things (classes) or into groups of classes; this bears similarity to categorization, but with perhaps a more theoretical bent, as classification can be applied over a wide semantic spectrum.

Data Reference Model

The Data Reference Model (DRM) is one of the five reference models of the Federal Enterprise Architecture.

A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for.

Metadata Data about data

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

A metadata repository is a database created to store metadata. Metadata is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actual "data about data". Metadata adds one layer of abstraction to this definition– it is data about the structures that contain data. Metadata may describe the structure of any data, of any subject, stored in any format.

In computing, a data definition specification (DDS) is a guideline to ensure comprehensive and consistent data definition. It represents the attributes required to quantify data definition. A comprehensive data definition specification encompasses enterprise data, the hierarchy of data management, prescribed guidance enforcement and criteria to determine compliance.

References

    Sources

    1. ISO/IEC 11179-4:2004 Metadata registries (MDR) - Part 4
    2. ISO/IEC Technical Report 20943-1, First edition, 2003-08-01 Information technology — Procedures for achieving metadata registry consistency