Data definition specification

Last updated

In computing, a data definition specification (DDS) is a guideline to ensure comprehensive and consistent data definition. It represents the attributes required to quantify data definition. A comprehensive data definition specification encompasses enterprise data, the hierarchy of data management, prescribed guidance enforcement and criteria to determine compliance.

Contents

Overview

A data definition specification may be developed for any organization or specialized field, improving the quality of its products through consistency and transparency. It eliminates redundancy (since all contributing areas are referencing the same specification) and provides standardization, making it easier and more efficient to create, modify, verify, analyze and share information across the enterprise. [1]

To understand how a data definition specification works in an enterprise, we must look at the elements of a DDS. Writing data definitions, defining business terms (or rules) in the context of a particular environment, provides structure for an organization's data architecture. In developing these definitions, the words used must be traceable to clearly defined data.

A data definition specification may be used in the following activities:

Criteria

A data definition specification requires data definitions to be:

  • File – A concept identifiable with one word
  • File extension – A concept identifiable with more than one word

Applications

Enterprise data

A data definition specification was produced by the Open Mobile Alliance to document charging data. [4] The document, the centralized catalog of data elements defined for interfaces, specifies the mapping of these data elements to protocol fields in the interfaces. Created for the exchange of financial data, Market Data Definition Language (MDDL) is an XML specification designed

to enable the interchange of information necessary to account, to analyze, and to trade financial instruments of the world's markets. It defines an XML-based interchange format and common data dictionary on the fields needed to describe: (1) financial instruments, (2) corporate events affecting value and tradability, and (3) market-related, economic and industrial indicators. The principal function of MDDL is to allow entities to exchange market data by standardizing formats and definitions. MDDL provides a common format for market data so that it can be efficiently passed from one processing system to another and provides a common understanding of market data content by standardizing terminology and by normalizing the relationships of various data elements to one another ... From the user perspective, the goal of MDDL is to enable users to integrate data from multiple sources by standardizing both the input feeds used for data warehousing (i.e., define what's being provided by vendors) and the output methods by which client applications request the data (i.e., ensure compatibility on how to get data in and out of applications)." [5]

Clinical submissions

The Clinical Data Interchange Standards Consortium, a global, multidisciplinary, non-profit organization, has established standards to support the acquisition, exchange, submission and archiving of clinical research data and metadata. CDISC standards are vendor-neutral, platform-independent and freely available from the CDISC website. The Case Report Tabulation Data Definition Specification (define.xml) draft version 2.0, the oldest data definition specification, is part of the evolution from the 1999 FDA electronic submission (eSub) guidance and electronic Common Technical Document (eCTD) documents specifying that a document describing the content and structure of included data be included in a submission. Define.xml was developed to automate the review process by generating a machine-readable data-definition document. Define.xml has standardized submissions to the Food and Drug Administration, reducing review times from over two years to several months. [6]

Archival data

A data definition specification is the foundation of metadata for scientific data archiving. The Metadata Encoding and Transmission Standard (METS) uses one principle of a DDS: consistent use of key terms to catalog digital objects for global use. The METS schema is a flexible mechanism for encoding descriptive, administrative and structural metadata for a digital library object and expressing complex links between metadata, and can provide a useful standard for the exchange of digital-library objects between repositories. [7]

A similar effort is underway to preserve complex data associated with video-game archiving. Preserving Virtual Worlds attempted to address archival-format deficiencies, citing the lack of suitable documentation for interactive fiction and games at the bit level: specifically, the absence of "representation information" needed to map raw bits into higher-level data constructs. [8] Preserving Virtual Worlds 2 is a research project expanding on initial efforts in this field. [9]

See also

Related Research Articles

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:

  1. An identification such as a data element name
  2. A clear data element definition
  3. One or more representation terms
  4. Optional enumerated values Code (metadata)
  5. A list of synonyms to data elements in other metadata registries Synonym ring

The Organization for the Advancement of Structured Information Standards is a nonprofit consortium that works on the development, convergence, and adoption of open standards for cybersecurity, blockchain, Internet of things (IoT), emergency management, cloud computing, legal data exchange, energy, content technologies, and other areas.

The Advanced Authoring Format (AAF) is a file format for professional cross-platform data interchange, designed for the video post-production and authoring environment. It was created by the Advanced Media Workflow Association (AMWA), and is now being standardized through the Society of Motion Picture and Television Engineers (SMPTE).

<span class="mw-page-title-main">Learning object metadata</span> Data model

Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability, and to facilitate their interoperability, usually in the context of online learning management systems (LMS).

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

The ISO/IEC 11179 Metadata Registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandable and shareable.

NIEMOpen, frequently referred to as NIEM, originated as an XML-based information exchange framework from the United States, but has transitioned to an OASISOpen Project. This initiative formalizes NIEM's designation as an official standard in national and international policy and procurement. NIEMOpen's Project Governing Board recently approved the first standard under this new project; the Conformance Targets Attribute Specification (CTAS) Version 3.0. A full collection of NIEMOpen standards are anticipated by end of year 2024.

A representation term is a word, or a combination of words, that semantically represent the data type of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.

A data element name is a name given to a data element in, for example, a data dictionary or metadata registry. In a formal data dictionary, there is often a requirement that no two data elements may have the same name, to allow the data element name to become an identifier, though some data dictionaries may provide ways to qualify the name in some way, for example by the application system or other context in which it occurs.

In metadata, a data element definition is a human readable phrase or sentence associated with a data element within a data dictionary that describes the meaning or semantics of a data element.

In metadata, an indicator is a Boolean value that may contain only the values true or false. The definition of an Indicator must include the meaning of a true value and should also include the meaning if the value is false.

The Clinical Data Interchange Standards Consortium (CDISC) is a standards developing organization (SDO) dealing with medical research data linked with healthcare, to "enable information system interoperability to improve medical research and related areas of healthcare". The standards support medical research from protocol through analysis and reporting of results and have been shown to decrease resources needed by 60% overall and 70–90% in the start-up stages when they are implemented at the beginning of the research process.

The AgMES initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability, and data exchange for different types of information resources.

The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium (W3C). The standard is maintained as part of the MARC standards of the Library of Congress, and is being developed as an initiative of the Digital Library Federation (DLF).

SDTM defines a standard structure for human clinical trial (study) data tabulations and for nonclinical study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA). The Submission Data Standards team of Clinical Data Interchange Standards Consortium (CDISC) defines SDTM.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

References

  1. Gouin, Deborah. & Corcoran, Charmane K. (2008). Developing the MSU Enterprise Data Definition Standard. Michigan State University Web site: http://eis.msu.edu/uploads/---University%20EIS%20Working%20Committee%20Meetings/05%20August%202008/Enterprise%20Data%20Definition%20Standard%20Presentation082708.pdf Archived 2013-10-04 at the Wayback Machine
  2. Thomas, Gwen. (2008). Writing Enterprise-Quality Data Definitions: Tips for Creating Terms and Definitions. Data Governance Institute Web site: http://www.datagovernance.com/dgi_wp_writing_enterprise-quality_data_definitions.pdf Archived 2012-08-12 at the Wayback Machine
  3. International Organization for Standardization JTC1/SC32 Committee. (2004) ISO 11179-4. http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html.
  4. "Charging Data" (PDF). Open Mobile Alliance. 1 February 2011. pp. 6, 35. Archived from the original (PDF) on 6 October 2013. Retrieved 12 March 2014.
  5. "Market Data Definition Language (MDDL)". Cover Pages. December 26, 2002. Archived from the original on December 14, 2013. Retrieved March 12, 2014.
  6. "Define-XML". Clinical Data Interchange Standards Consortium. 2012. Archived from the original on October 4, 2013. Retrieved March 12, 2014.
  7. Metadata Encoding & Transmission Standard (METS) Web site from The Library of Congress- Standards https://www.loc.gov/standards/mets/
  8. “Meta Data Schema Development” (2008) Preserving Virtual Worlds website Archived 2012-03-26 at the Wayback Machine
  9. Preserving Virtual Worlds 2, Researching best practices for videogame preservation. (2012). http://pvw.illinois.edu/pvw2/ Archived 2012-11-14 at the Wayback Machine