Schema crosswalk

Last updated

A schema crosswalk is a table that shows equivalent elements (or "fields") in more than one database schema. It maps the elements in one schema to the equivalent elements in another.

Contents

Crosswalk tables are often employed within or in parallel to enterprise systems, especially when multiple systems are interfaced or when the system includes legacy system data. In the context of Interfaces, they function as an internal Extract, Transform, Load (ETL) mechanism.

For example, this is a metadata crosswalk from MARC standards to Dublin Core:

MARC fieldDublin Core element
$260c (Date of publication, distribution, etc.)Date.Created
522 (Geographic Coverage Note)Coverage.Spatial
$300a (Physical Description)Format.Extent

Crosswalks show people where to put the data from one scheme into a different scheme. They are often used by libraries, archives, museums, and other cultural institutions to translate data to or from MARC standards, Dublin Core, Text Encoding Initiative (TEI), and other metadata schemes. For example, an archive has a MARC record in its catalog describing a manuscript. Suppose the archive makes a digital copy of that manuscript and wants to display it on the web along with the information from the catalog. In that case, it will have to translate the data from the MARC catalog record into a different format, such as Metadata Object Description Schema, that is viewable on a webpage. Because MARC has various fields than MODS, decisions must be made about where to put the data into MODS. This type of "translating" from one format to another is often called "metadata mapping" or "field mapping," and is related to "data mapping", and "semantic mapping".

Crosswalks also have several technical capabilities. They help databases using different metadata schemes to share information. They help metadata harvesters create union catalogs. They enable search engines to search multiple databases simultaneously with a single query.

Challenges for crosswalks

One of the biggest challenges for crosswalks is that no two metadata schemes are 100% equivalent. One scheme may have a field that doesn't exist in another scheme or a field that is split into two different fields in another scheme; this is why you often lose data when mapping from a complex scheme to a simpler one. For example, when mapping from MARC to Simple Dublin Core, you lose the distinction between types of titles:

MARC fieldDublin Core element
210 Abbreviated TitleTitle
222 Key TitleTitle
240 Uniform TitleTitle
242 Translated TitleTitle
245 Title StatementTitle
246 Variant TitleTitle

Simple Dublin Core only has one "Title" element, so all of the different types of MARC titles get lumped together without further distinctions. This is called "many-to-one" mapping. This is also why once you've translated these titles into Simple Dublin Core, you can't translate them back into MARC. Once they're Simple Dublin Core, you've lost the MARC information about what types of titles they are, so when you map from Simple Dublin Core back to MARC, all the data in the "Title" element maps to the basic MARC 245 Title Statement field. [1]

Dublin Core elementMARC field
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement

This is why crosswalks are said to be "lateral" (one-way) mappings from one scheme to another. Separate crosswalks would be required to map from scheme A to scheme B and from scheme B to scheme A. [2]

Difficulties in mapping

Other mapping problems arise when:

Some of these problems are not fixable. As Karen Coyle says in "Crosswalking Citation Metadata: The University of California's Experience,"

"The more metadata experience we have, the more it becomes clear that metadata perfection is not attainable, and anyone who attempts it will be sorely disappointed. When metadata is crosswalked between two or more unrelated sources, there will be data elements that cannot be reconciled in an ideal manner. The key to a successful metadata crosswalk is intelligent flexibility. It is essential to focus on the important goals and be willing to compromise to reach a practical conclusion to projects." [3]

See also

Related Research Articles

Dublin Core Standardized set of metadata elements

The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has been formally standardized as ISO 15836, ANSI/NISO Z39.85, and IETF RFC 5013. The Dublin Core Metadata Initiative (DCMI), which formulates the Dublin Core, is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization. The core properties are part of a larger set of DCMI Metadata Terms. "Dublin Core" is also used as an adjective for Dublin Core metadata, a style of metadata that draws on multiple Resource Description Framework (RDF) vocabularies, packaged and constrained in Dublin Core application profiles.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

MARCstandards are a set of digital formats for the description of items catalogued by libraries, such as books, DVDs, and digital resources. Computerized library catalogs and library management software need to structure their catalog records as per an industry-wide standard, which is MARC, so that bibliographic information can be shared freely between computers. The structure of bibliographic records almost universally follows the MARC standard. Other standards work in conjunction with MARC, for example, Anglo-American Cataloguing Rules (AACR)/Resource Description and Access (RDA) provide guidelines on formulating bibliographic data into the MARC record structure, while the International Standard Bibliographic Description (ISBD) provides guidelines for displaying MARC records in a standard, human-readable form.

XBRL Exchange format for business information

XBRL is a freely available and global framework for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. The language is XML-based and uses the XML syntax and related XML technologies such as XML Schema, XLink, XPath, and Namespaces. One use of XBRL is to define and exchange financial information, such as a financial statement. The XBRL Specification is developed and published by XBRL International, Inc. (XII).

The PBCore metadata standard was created by the public broadcasting community in the United States of America for use by public broadcasters and related communities that manage audiovisual assets, including libraries, archives, independent producers, etc. PBCore is organized as a set of specified fields that can be used in database applications, and it can be used as a data model for media cataloging and asset management systems. As an XML schema, PBCore enables data exchange between media collections, systems and organizations.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

Learning object metadata Data model

Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability, and to facilitate their interoperability, usually in the context of online learning management systems (LMS).

The semantic spectrum is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.

The e-Government Metadata Standard, e-GMS, is the UK e-Government Metadata Standard. It defines how UK public sector bodies should label content such as web pages and documents to make such information more easily managed, found and shared.

Catalogue Service for the Web (CSW), sometimes seen as Catalogue Service - Web, is a standard for exposing a catalogue of geospatial records in XML on the Internet. The catalogue is made up of records that describe geospatial data, geospatial services, and related resources.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

The AgMES initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability and data exchange for different types of information resources.

The Metadata Object Description Schema (MODS) is an XML-based bibliographic description schema developed by the United States Library of Congress' Network Development and Standards Office. MODS was designed as a compromise between the complexity of the MARC format used by libraries and the extreme simplicity of Dublin Core metadata.

Entity Framework (EF) is an open source object–relational mapping (ORM) framework for ADO.NET. It was originally shipped as an integral part of .NET Framework. Starting with Entity Framework version 6, it has been delivered separately from the .NET Framework.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

Lightweight Information Describing Objects (LIDO) is an XML schema for describing museum or collection objects. Memory institutions use LIDO for “exposing, sharing and connecting data on the web”. It can be applied to all kind of disciplines in cultural heritage, e.g. art, natural history, technology, etc. LIDO is a specific application of CIDOC CRM.

The Maschinelles Austauschformat für Bibliotheken or MAB is a bibliographic data exchange format.

References

  1. "Dublin Core to MARC Crosswalk," Network Development and MARC Standards Office, Library of Congress
  2. Caplan, Priscilla (2003). Metadata fundamentals for all librarians. Chicago: American Library Association. pp.  39. ISBN   0838908470.
  3. in "Metadata in Practice" Diane I. Hillmann and Elaine L. Westbrooks, eds., American Library Association, Chicago, 2004, p. 91.