Schema crosswalk

Last updated

A schema crosswalk is a table that shows equivalent elements (or "fields") in more than one database schema. It maps the elements in one schema to the equivalent elements in another.

Contents

Crosswalk tables are often employed within or in parallel to enterprise systems, especially when multiple systems are interfaced or when the system includes legacy system data. In the context of Interfaces, they function as an internal extract, transform, load (ETL) mechanism.

For example, this is a metadata crosswalk from MARC standards to Dublin Core:

MARC fieldDublin Core element
$260c (Date of publication, distribution, etc.)Date.Created
522 (Geographic Coverage Note)Coverage.Spatial
$300a (Physical Description)Format.Extent

Crosswalks show people where to put the data from one scheme into a different scheme. They are often used by libraries, archives, museums, and other cultural institutions to translate data to or from MARC standards, Dublin Core, Text Encoding Initiative (TEI), and other metadata schemes. For example, an archive has a MARC record in its catalog describing a manuscript. Suppose the archive makes a digital copy of that manuscript and wants to display it on the web along with the information from the catalog. In that case, it will have to translate the data from the MARC catalog record into a different format, such as Metadata Object Description Schema, that is viewable on a webpage. Because MARC has various fields than MODS, decisions must be made about where to put the data into MODS. This type of "translating" from one format to another is often called "metadata mapping" or "field mapping," and is related to "data mapping", and "semantic mapping".

Crosswalks also have several technical capabilities. They help databases using different metadata schemes to share information. They help metadata harvesters create union catalogs. They enable search engines to search multiple databases simultaneously with a single query.

Challenges for crosswalks

One of the biggest challenges for crosswalks is that no two metadata schemes are 100% equivalent. One scheme may have a field that doesn't exist in another scheme or a field that is split into two different fields in another scheme; this is why data is often lost when mapping from a complex scheme to a simpler one. For example, when mapping from MARC to Simple Dublin Core, the distinction between types of titles is lost:

MARC fieldDublin Core element
210 Abbreviated TitleTitle
222 Key TitleTitle
240 Uniform TitleTitle
242 Translated TitleTitle
245 Title StatementTitle
246 Variant TitleTitle

Simple Dublin Core only has one "Title" element, so all of the different types of MARC titles get lumped together without further distinctions. A future attempt to convert the metadata back into MARC would enter the information in the basic MARC 245 Title Statement field, with none of the original distinctions. [1]

Dublin Core elementMARC field
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement
Title245 Title Statement

This is why crosswalks are said to be "lateral" (one-way) mappings from one scheme to another. Separate crosswalks would be required to map from scheme A to scheme B and from scheme B to scheme A. [2]

Difficulties in mapping

Other mapping problems arise when:

Some of these problems are not fixable. As Karen Coyle says in "Crosswalking Citation Metadata: The University of California's Experience,"

"The more metadata experience we have, the more it becomes clear that metadata perfection is not attainable, and anyone who attempts it will be sorely disappointed. When metadata is crosswalked between two or more unrelated sources, there will be data elements that cannot be reconciled in an ideal manner. The key to a successful metadata crosswalk is intelligent flexibility. It is essential to focus on the important goals and be willing to compromise to reach a practical conclusion to projects." [3]

See also

Related Research Articles

<span class="mw-page-title-main">Dublin Core</span> Standardized set of metadata elements

The Dublin Core vocabulary, also known as the Dublin Core Metadata Terms (DCMT), is a general purpose metadata vocabulary for describing resources of any type. It was first developed for describing web content in the early days of the World Wide Web. The Dublin Core Metadata Initiative (DCMI) is responsible for maintaining the Dublin Core vocabulary.

MARC is a standard set of digital formats for the machine-readable description of items catalogued by libraries, such as books, DVDs, and digital resources. Computerized library catalogs and library management software need to structure their catalog records as per an industry-wide standard, which is MARC, so that bibliographic information can be shared freely between computers. The structure of bibliographic records almost universally follows the MARC standard. Other standards work in conjunction with MARC, for example, Anglo-American Cataloguing Rules (AACR)/Resource Description and Access (RDA) provide guidelines on formulating bibliographic data into the MARC record structure, while the International Standard Bibliographic Description (ISBD) provides guidelines for displaying MARC records in a standard, human-readable form.

<span class="mw-page-title-main">XBRL</span> Exchange format for business information

XBRL is a freely available and global framework for exchanging business information. XBRL allows the expression of semantics commonly required in business reporting. The standard was originally based on XML, but now additionally supports reports in JSON and CSV formats, as well as the original XML-based syntax. XBRL is also increasingly used in its Inline XBRL variant, which embeds XBRL tags into an HTML document. One common use of XBRL is the exchange of financial information, such as in a company's annual financial report. The XBRL standard is developed and published by XBRL International, Inc. (XII).

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three-letter codes for identifying languages. The standard was published by International Organization for Standardization (ISO) on 1 February 2007.

The PBCore metadata standard was created by the public broadcasting community in the United States of America for use by public broadcasters and related communities that manage audiovisual assets, including libraries, archives, independent producers, etc. PBCore is organized as a set of specified fields that can be used in database applications, and it can be used as a data model for media cataloging and asset management systems. As an XML schema, PBCore enables data exchange between media collections, systems and organizations.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks, including:

<span class="mw-page-title-main">Learning object metadata</span> Data model

Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability, and to facilitate their interoperability, usually in the context of online learning management systems (LMS).

The e-Government Metadata Standard, e-GMS, is the UK e-Government Metadata Standard. It defines how UK public sector bodies should label content such as web pages and documents to make such information more easily managed, found and shared.

Catalogue Service for the Web (CSW), sometimes seen as Catalogue Service - Web, is a standard for exposing a catalogue of geospatial records in XML on the Internet. The catalogue is made up of records that describe geospatial data, geospatial services, and related resources.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume, complexity and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

The AgMES initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability, and data exchange for different types of information resources.

Agricultural Information Management Standards (AIMS) is a web site managed by the Food and Agriculture Organization of the United Nations (FAO) for accessing and discussing agricultural information management standards, tools and methodologies connecting information workers worldwide to build a global community of practice. Information management standards, tools and good practices can be found on AIMS:

The Metadata Object Description Schema (MODS) is an XML-based bibliographic description schema developed by the United States Library of Congress' Network Development and Standards Office. MODS was designed as a compromise between the complexity of the MARC format used by libraries and the extreme simplicity of Dublin Core metadata.

PREservation Metadata: Implementation Strategies (PREMIS) is the de facto digital preservation metadata standard.

<span class="mw-page-title-main">Metadata</span> Data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

Lightweight Information Describing Objects (LIDO) is an XML schema for describing museum or collection objects. Memory institutions use LIDO for “exposing, sharing and connecting data on the web”. It can be applied to all kind of disciplines in cultural heritage, e.g. art, natural history, technology, etc. LIDO is a specific application of CIDOC CRM.

References

  1. "Dublin Core to MARC Crosswalk," Network Development and MARC Standards Office, Library of Congress
  2. Caplan, Priscilla (2003). Metadata fundamentals for all librarians. Chicago: American Library Association. pp.  39. ISBN   0838908470.
  3. in "Metadata in Practice" Diane I. Hillmann and Elaine L. Westbrooks, eds., American Library Association, Chicago, 2004, p. 91.