This article needs additional citations for verification .(December 2020) |
A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata. [1]
Metadata is often defined as data about data. [2] [3] [4] It is “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use or manage an information resource”, especially in a distributed network environment like for example the internet or an organization. [5] A good example of metadata is the cataloging system found in libraries, which records for example the author, title, subject, and location on the shelf of a resource. Another is software system knowledge extraction of software objects such as data flows, control flows, call maps, architectures, business rules, business terms, and database schemas.
Metadata is usually categorized in three types: [6] [3]
Metadata elements grouped into sets designed for a specific purpose, e.g., for a specific domain or a particular type of information resource, are called metadata schemas. For every element the name and the semantics (the meaning of the element) are specified. Content rules (how content must be formulated), representation rules (e.g., capitalization rules), and allowed element values (e.g., from a controlled vocabulary) can be specified optionally. Some schemas also specify in which syntax the elements must be encoded, in contrast to syntax independent schemas. Many current schemas use Standard Generalized Markup Language (SGML) or XML to specify their syntax. Metadata schemas that are developed and maintained by standard organizations (such as ISO) or organizations that have taken on such responsibility (such as the Dublin Core Metadata Initiative) are called metadata standards.
Many different metadata schemas are being developed as standards across disciplines, such as library science, education, archiving, e-commerce, and arts. In the table below, an overview of available metadata standards is given.
Name | Focus | Description |
---|---|---|
DDI [7] | Archiving and Social Science | The Data Documentation Initiative is an international effort to establish a standard for technical documentation describing social science data. A membership-based Alliance is developing the DDI specification, which is written in XML. |
EBUCore [8] | The EBUCore metadata set for audiovisual content | EBUCore is a set of descriptive and technical metadata based on the Dublin Core and adapted to media. EBUCore is the flagship metadata specification of EBU, [9] the largest professional association of broadcasters around the world. It is developed and maintained by EBU's Technical Department. [10] EBU has a long history in the definition of metadata solutions for broadcasters. [11] EBUCore is largely used as shown in this report. [12] EBUCore is registered in SMPTE. It is also available in RDF. [13] |
EBU CCDM [14] | The EBU Class Conceptual Data Model - CCDM | The EBU Class Conceptual Data Model (CCDM) is an ontology defining a basic set of Classes and properties as a common vocabulary to describe programmes in their different phases of creation from commissioning to delivery. CCDM is a common framework and users are invited to further enrich the model with Classes and properties fitting more specifically their needs. |
FOAF [15] | Friend of a Friend (FOAF) | The Friend of a Friend (FOAF) project is about creating a Web of machine-readable homepages describing people, the links between them and the things they create and do. |
EAD [16] | Archiving | Encoded Archival Description is a standard for encoding archival finding aids using XML in archival and manuscript repositories. |
CDWA [17] | Arts | Categories for the Description of Works of Art is a conceptual framework for describing and accessing information about works of art, architecture, and other material culture. |
VRA Core [18] | Arts | Visual Resources Association provides a categorical organization for the description of works of visual culture as well as the images that document them. |
Darwin Core [19] | Biology | The Darwin Core is a metadata specification for information about the geographic occurrence of species and the existence of specimens in collections. |
ONIX [20] | Book industry | Online Information Exchange is an international standard for representing and communicating book industry product information in electronic form. |
CWM | Data warehousing | The main purpose of the Common Warehouse Metamodel is to enable easy interchange of warehouse and business intelligence metadata in distributed heterogeneous environments. |
EML [21] | Ecology | Ecological Metadata Language is a specification developed for the ecology discipline. |
IEEE LOM [22] | Education | Learning Objects Metadata specifies the syntax and semantics of Learning Object Metadata. |
CSDGM [23] | Geographic data | Content Standard for Digital Geospatial Metadata is maintained by the Federal Geographic Data Committee (FGDC). |
ISO 19115 [24] | Geographic data | The ISO 19115:2003 Geographic information — Metadata standard defines how to describe geographical information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use. It is maintained by the ISO/TC 211 committee. |
e-GMS [25] | Government | The e-Government Metadata Standard (E-GMS) defines the metadata elements for information resources to ensure maximum consistency of metadata across public sector organizations in the UK. |
GILS [26] | Government/organizations | The Global Information Locator Service defines an open, low-cost, and scalable standard so that governments, companies, or other organizations can help searchers find information. |
TEI [27] | Humanities, social sciences and linguistics | Text Encoding Initiative is a standard for the representation of texts in digital form, chiefly in the humanities, social sciences and linguistics. |
NISO MIX [28] | Images | Z39.87 Data dictionary is technical metadata for digital still images (MIX) – NISO Metadata for Images in XML is an XML schema for a set of technical data elements required to manage digital image collections. |
<indecs> [29] | Intellectual property | Indecs Content Model – Interoperability of Data in E-Commerce Systems addresses the need to put different creation identifiers and metadata into a framework to support the management of intellectual property rights. |
MARC [30] | Librarianship | MARC - MAchine Readable Cataloging – standards for the representation and communication of bibliographic and related information in machine-readable form. |
METS [31] | Librarianship | Metadata Encoding and Transmission Standard is an XML schema for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. |
MODS [32] | Librarianship | Metadata Object Description Schema is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. |
MADS [33] | Librarianship | Metadata Authority Description Schema is a schema for authority control that may be used for a variety of purposes, especially managing names of people, organization, or geographical areas, and particularly for library applications. |
XOBIS [34] | Librarianship | XML Organic Bibliographic Information Schema is an XML schema for modeling MARC data. |
PBCore [35] | Media | PBCore is a metadata and cataloging resource for public broadcasters and associated communities. |
MPEG-7 [36] | Multimedia | The Multimedia Content Description Interface MPEG-7 is an ISO/IEC standard and specifies a set of descriptors to describe various types of multimedia information; it is developed by the Moving Picture Experts Group. |
MEI [37] | Music notation | Music Encoding Initiative is a community-driven effort to create a commonly accepted, digital, symbolic representation of music notation documents. |
Dublin Core [38] | Networked resources | Dublin Core – interoperable online metadata standard focused on networked resources. |
DOI [39] | Networked resources | Digital Object Identifier provides a system for the identification and hence management of information ("content") on digital networks, providing persistence and semantic interoperability. |
ISO/IEC 11179 [40] | Organizations | ISO/IEC 11179 Standard describes the metadata and activities needed to manage data elements in a registry to create a common understanding of data across organizational elements and between organizations. |
ISO/IEC 19506 [41] | Software Systems | ISO/IEC 19506 Standard called Knowledge Discovery Metamodel is an ontology for describing software systems. The standard provides both a detailed ontology and common data format for representing granular software objects and their relationships enabling the extractions such as data flows, control flows, call maps, architecture, database schemas, business rules/terms and the derivation of business processes. Used primarily for legacy and existing systems security, compliance and modernization. |
ISO 23081 [42] | Records management | ISO 23081 is a three-part technical specification defining metadata needed to manage records. Part 1 addresses principles, part 2 addresses conceptual and implementation issues, and part 3 outlines a self-assessment method. |
MoReq2010 [43] | Records management | MoReq2010 is a specification describing the MOdel REQuirements for the management of electronic records. |
DIF [44] | Scientific data sets | Directory Interchange Format is a descriptive and standardized format for exchanging information about scientific data sets. |
RAD | Librarianship and archiving | The Rules for Archival Description (RAD) is the Canadian archival descriptive standard. It is overseen by the Canadian Committee on Archival Description of the Canadian Council of Archives. [45] Similar in structure to AACR2, it was last revised in 2008. [46] |
RDF | Web resources | Resource Description Framework (RDF) is a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats. |
MDDL [47] | Financial market | The (Financial) Market Data Definition Language (MDDL) has been developed by the Financial Information Services Division (FISD) of the Software and Information Industry Association (SIIA). MDDL is an extensible Markup Language (XML) derived specification, which facilitates the interchange of information about financial instruments used throughout the world financial markets. MDDL helps in mapping all market data into a common language and structure to ease the interchange and processing of multiple complex data sets. |
NIEM [48] | Law enforcement; Social services; Enterprise resource planning | NIEM – the National Information Exchange Model – is a community-driven, US government-wide, standards-based approach to exchanging information. NIEM's data domains are growing standards developed and maintained by domain communities. Some sample domains included or being developed in NIEM are: chemistry/biology/radiation/nuclear; justice; intelligence; immigration; international trade; biometrics; emergency management; screening; human services; children, youth, and family services; health; infrastructure protection; military operations; maritime; and surface transportation. |
SAML [49] | Shibboleth has been evolved by Internet2/MACE. It provides a method of distributed authentication and authorization for participating HTTP(S) based applications. | Security Assertion Markup Language is an XML-based open standard data format for exchanging authentication and authorization data between parties. A schema example can be found on OASIS [50] (Advancing open standards for the information society) |
The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources. It was the first metadata standard for describing web content. The Dublin Core Metadata Initiative (DCMI) is responsible for formulating the Dublin Core; DCMI is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization.
ANSEL, the American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use, was a character set used in text encoding. It provided a table of coded values for the representation of characters of the extended Latin alphabet in machine-readable form for thirty-five languages written in the Latin alphabet and for fifty-one romanized languages. ANSEL adds 63 graphic characters to ASCII, including 29 combining diacritic characters.
A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications.
MARC is a standard set of digital formats for the machine-readable description of items catalogued by libraries, such as books, DVDs, and digital resources. Computerized library catalogs and library management software need to structure their catalog records as per an industry-wide standard, which is MARC, so that bibliographic information can be shared freely between computers. The structure of bibliographic records almost universally follows the MARC standard. Other standards work in conjunction with MARC, for example, Anglo-American Cataloguing Rules (AACR)/Resource Description and Access (RDA) provide guidelines on formulating bibliographic data into the MARC record structure, while the International Standard Bibliographic Description (ISBD) provides guidelines for displaying MARC records in a standard, human-readable form.
ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three-letter codes for identifying languages. The standard was published by International Organization for Standardization (ISO) on 1 February 2007.
The National Information Standards Organization is a United States non-profit standards organization that develops, maintains and publishes technical standards related to publishing, bibliographic and library applications. It was founded in 1939 as the Z39 Committee, chaired from 1963-1977 by Jerrold Orne, incorporated as a not-for-profit education association in 1983, and assumed its current name in 1984.
An OpenURL is similar to a web address, but instead of referring to a physical website, it refers to an article, book, patent, or other resource within a website.
A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.
The ISO/IEC 11179 metadata registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandable and shareable.
NIEMOpen, frequently referred to as NIEM, originated as an XML-based information exchange framework from the United States, but has transitioned to an OASISOpen Project. This initiative formalizes NIEM's designation as an official standard in national and international policy and procurement. NIEMOpen's Project Governing Board recently approved the first standard under this new project; the Conformance Targets Attribute Specification (CTAS) Version 3.0. A full collection of NIEMOpen standards are anticipated by end of year 2024.
A representation term is a word, or a combination of words, that semantically represent the data type of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.
In metadata, a data element definition is a human readable phrase or sentence associated with a data element within a data dictionary that describes the meaning or semantics of a data element.
In metadata, an indicator is a Boolean value that may contain only the values true or false. The definition of an Indicator must include the meaning of a true value and should also include the meaning if the value is false.
The AgMES initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability, and data exchange for different types of information resources.
Geospatial metadata is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog.
The Open Packaging Conventions (OPC) is a container-file technology initially created by Microsoft to store a combination of XML and non-XML files that together form a single entity such as an Open XML Paper Specification (OpenXPS) document. OPC-based file formats combine the advantages of leaving the independent file entities embedded in the document intact and resulting in much smaller files compared to normal use of XML.
PREservation Metadata: Implementation Strategies (PREMIS) is the de facto digital preservation metadata standard.
Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
In computing, a data definition specification (DDS) is a guideline to ensure comprehensive and consistent data definition. It represents the attributes required to quantify data definition. A comprehensive data definition specification encompasses enterprise data, the hierarchy of data management, prescribed guidance enforcement and criteria to determine compliance.
The Journal Article Tag Suite (JATS) is an XML format used to describe scientific literature published online. It is a technical standard developed by the National Information Standards Organization (NISO) and approved by the American National Standards Institute with the code Z39.96-2012.