Metadata publishing

Last updated

Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes.

Metadata data about data

Metadata is "data [information] that provides information about other data". Many distinct types of metadata exist, among these descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata.

In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:

  1. An identification such as a data element name
  2. A clear data element definition
  3. One or more representation terms
  4. Optional enumerated values Code (metadata)
  5. A list of synonyms to data elements in other metadata registries Synonym ring

Contents

Metadata publishing is the foundation upon which advanced distributed computing functions are being built. But like building foundations, care must be taken in metadata publishing systems to ensure the structural integrity of the systems built on top of them.

Distributed computing is a field of computer science that studies distributed systems. A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. The components interact with one another in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.

Definition of metadata publishing

Published metadata has the following characteristics:

  1. Metadata structures available to the general public on a public web site or by a download
  2. There is a documented review and approval process for adding or updating data elements to the system
  3. New releases are made available without disturbing prior versions
  4. A publishing organization that makes a commitment to change control process

Benefits of metadata publishing

When classifying benefits of metadata publishing two groups are usually considered. External parties are usually consumers of information that are not part of the publishing organization. Internal parties are usually the various business units or departments within an organization.

Benefits to external parties

  1. Allows external systems (both people and agents) to have a clear understanding of the semantics of data elements in a system
  2. Allows third parties to build semantic maps between data models and import and export data between systems
  3. Promotes service oriented architectures and allow horizontal sharing of information between traditional information silos
  4. Allows systems to participate in accurately indexed and federated search processes

Benefits to internal parties

  1. allows parties from diverse business units to agree on shared data definitions and separate department or function specific definitions
  2. makes Extract, transform, load (ETL) operations more precise for data warehousing
  3. allows user interface designers to access a common pool of screen and report header labels
  4. promotion of model-driven architecture

Objections to metadata publishing

Core process in metadata publishing

The following are some of the core processes in metadata publishing

  1. Gathering of metadata requirements
  2. Selection of metadata registry and metadata publishing tools
  3. Training of metadata concepts to project participants
  4. Stakeholder group formation
  5. Metadata harvesting
  6. Glossary consolidation
  7. Initial upper ontology construction (abstract data elements)
  8. Draft data element loading
  9. Data element review process
  10. Publishing approved metadata elements in a variety of output formats (see below)
  11. Creation and maintenance of versions and depreciation of unused or redundant data elements

File format metadata publishing

Organizations that create applications that store data in file systems can also publish metadata definitions. One common way to perform this is to store application data in a compressed XML file format. The XML files can be uncompressed and validated against an external XML Schema. An example of this is done by the Open Source FreeMind tool.

FreeMind software

FreeMind is a free mind mapping application written in Java. FreeMind is licensed under the GNU General Public License Version 2. It provides extensive export capabilities. It runs on Microsoft Windows, Linux, and macOS via the Java Runtime Environment.

Metadata publishing formats

  1. HTML - used for browsing a web site and indexing by text-based search engines
  2. Web Ontology Language (OWL) - used by metadata search engines such as Swoogle
  3. XML Metadata Interchange (XMI) - OMG standard for exchanging metadata
  4. Common Warehouse Metamodel (CMW) - OMG standard for data warehouse metadata
  5. Topic maps - an ISO standard for the representation and interchange of knowledge, with an emphasis on the findability of information.
  6. KM3 or Kernel Meta Meta Model as used in the Metamodel Zoos. The AtlanticZoo is an open source library of more than 100 metamodels under EPL License. KM3 [ permanent dead link ] is a simple Domain Specific Language for specifying metamodels. A number of transformations are available to translate from KM3 to other notations like XMI.

See also

A bibliographic database is a database of bibliographic records, an organized digital collection of references to published literature, including journal and newspaper articles, conference proceedings, reports, government and legal publications, patents, books, etc. In contrast to library catalogue entries, a large proportion of the bibliographic records in bibliographic databases describe articles, conference papers, etc., rather than complete monographs, and they generally contain very rich subject descriptions in the form of keywords, subject classification terms, or abstracts.

Data governance is a data management concept concerning the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.

Semantic technology

In software, semantic technology encodes meanings separately from data and content files, and separately from application code.

Related Research Articles

Unified Modeling Language general-purpose, developmental, modeling language in the field of software engineering

The Unified Modeling Language (UML) is a general-purpose, developmental, modeling language in the field of software engineering that is intended to provide a standard way to visualize the design of a system.

The XML Metadata Interchange (XMI) is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML).

Meta-Object Facility

The Meta-Object Facility (MOF) is an Object Management Group (OMG) standard for model-driven engineering. Its purpose is to provide a type system for entities in the CORBA architecture and a set of interfaces through which those types can be created and manipulated. The official reference page may be found at OMG's website.

Model-driven architecture (MDA) is a software design approach for the development of software systems. It provides a set of guidelines for the structuring of specifications, which are expressed as models. Model-driven architecture is a kind of domain engineering, and supports model-driven engineering of software systems. It was launched by the Object Management Group (OMG) in 2001.

The Object Constraint Language (OCL) is a declarative language describing rules applying to Unified Modeling Language (UML) models developed at IBM and is now part of the UML standard. Initially, OCL was merely a formal specification language extension for UML. OCL may now be used with any Meta-Object Facility (MOF) Object Management Group (OMG) meta-model, including UML. The Object Constraint Language is a precise text language that provides constraint and object query expressions on any MOF model or meta-model that cannot otherwise be expressed by diagrammatic notation. OCL is a key component of the new OMG standard recommendation for transforming models, the Queries/Views/Transformations (QVT) specification.

The common warehouse metamodel (CWM) defines a specification for modeling metadata for relational, non-relational, multi-dimensional, and most other objects found in a data warehousing environment. The specification is released and owned by the Object Management Group, which also claims a trademark in the use of "CWM".

Metamodeling

A metamodel or surrogate model is a model of a model, and metamodeling is the process of generating such metamodels. Thus metamodeling or meta-modeling is the analysis, construction and development of the frames, rules, constraints, models and theories applicable and useful for modeling a predefined class of problems. As its name implies, this concept applies the notions of meta- and modeling in software engineering and systems engineering. Metamodels are of many types and have diverse applications.

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

ISO/IEC 11179 is an international standard for representing metadata for an organization in a metadata registry.

A representation term is a word, or a combination of words, that semantically represent the data type of a data element. A representation term is commonly referred to as a class word by those familiar with data dictionaries. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class As used in ISO/IEC 11179, the representation term is that part of a data element name that provides a semantic pointer to the underlying data type. A Representation class is a class of representations. This representation class provides a way to classify or group data elements.

The semantic spectrum is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.

Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Eclipse Modeling Framework modeling framework and code generation facility for building tools and other applications based on a structured data model

Eclipse Modeling Framework (EMF) is an Eclipse-based modeling framework and code generation facility for building tools and other applications based on a structured data model.

Geospatial metadata is a type of metadata that is applicable to objects that have an explicit or implicit geographic extent, i.e. are associated with some position on the surface of the globe. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog.

The Business Process Definition Metamodel (BPDM) is a standard definition of concepts used to express business process models, adopted by the OMG. Metamodels define concepts, relationships, and semantics for exchange of user models between different modeling tools. The exchange format is defined by XSD and XMI, a specification for transformation of OMG metamodels to XML. Pursuant to the OMG's policies, the metamodel is the result of an open process involving submissions by member organizations, following a Request for Proposal (RFP) issued in 2003. BPDM was adopted in initial form in July 2007, and finalized in July 2008.

Knowledge Discovery Metamodel (KDM) is a publicly available specification from the Object Management Group (OMG). KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metadata required for deep semantic integration of Application Lifecycle Management tools. KDM was designed as the OMG's foundation for software modernization, IT portfolio management and software assurance. KDM uses OMG's Meta-Object Facility to define an XMI interchange format between tools that work with existing software as well as an abstract interface (API) for the next-generation assurance and modernization tools. KDM standardizes existing approaches to knowledge discovery in software engineering artifacts, also known as software mining.

The Semantics of Business Vocabulary and Business Rules (SBVR) is an adopted standard of the Object Management Group (OMG) intended to be the basis for formal and detailed natural language declarative description of a complex entity, such as a business. SBVR is intended to formalize complex compliance rules, such as operational rules for an enterprise, security policy, standard compliance, or regulatory compliance rules. Such formal vocabularies and rules can be interpreted and used by computer systems. SBVR is an integral part of the OMG's model-driven architecture (MDA).

The Ontology Definition MetaModel (ODM) is an Object Management Group (OMG) specification to make the concepts of Model-Driven Architecture applicable to the engineering of ontologies. Hence, it links Common Logic (CL), the Web Ontology Language (OWL), and the Resource Description Framework (RDF).