MARC standards

Last updated
MARC
Filename extension
.mrc, .marc
Internet media type
application/marc

MARC (machine-readable cataloging) is a standard set of digital formats for the machine-readable description of items catalogued by libraries, such as books, DVDs, and digital resources. Computerized library catalog s and library management software need to structure their catalog records as per an industry-wide standard, which is MARC, so that bibliographic information can be shared freely between computers. The structure of bibliographic records almost universally follows the MARC standard. Other standards work in conjunction with MARC, for example, Anglo-American Cataloguing Rules (AACR)/Resource Description and Access (RDA) provide guidelines on formulating bibliographic data into the MARC record structure, while the International Standard Bibliographic Description (ISBD) provides guidelines for displaying MARC records in a standard, human-readable form.

Contents

History

Working with the Library of Congress, American computer scientist Henriette Avram developed MARC between 1965 and 1968, making it possible to create records that could be read by computers and shared between libraries. [1] [2] By 1971, MARC formats had become the US national standard for dissemination of bibliographic data. Two years later, they became the international standard. There are several versions of MARC in use around the world, the most predominant being MARC 21, created in 1999 as a result of the harmonization of U.S. and Canadian MARC formats, and UNIMARC. UNIMARC is maintained by the Permanent UNIMARC Committee of the International Federation of Library Associations and Institutions (IFLA), and is widely used in Europe.

The MARC 21 family of standards now includes formats for authority records, holdings records, classification schedules, and community information, in addition to the format for bibliographic records.

Record structure and field designations

The MARC standards define three aspects of a MARC record: the field designations within each record, the structure of the record, and the actual content of the record itself.

Field designations

Each field in a MARC record provides particular information about the item the record is describing, such as the author, title, publisher, date, language, media type, etc. Since it was first developed at a time when computing power was low, and space precious, MARC uses a simple three-digit numeric code (from 001-999) to identify each field in the record. MARC defines field 100 as the primary author of a work, field 245 as the title and field 260 as the publisher, for example.

Fields above 008 are further divided into subfields using a single letter or number designation. The 260, for example, is further divided into subfield "a" for the place of publication, "b" for the name of the publisher, and "c" for the date of publication.

Record structure

MARC records are typically stored and transmitted as binary files, usually with several MARC records concatenated together into a single file. MARC uses the ISO 2709 standard to define the structure of each record. This includes a marker to indicate where each record begins and ends, as well as a set of characters at the beginning of each record that provide a directory for locating the fields and subfields within the record.

In 2002, the Library of Congress developed the MARCXML schema as an alternative record structure, allowing MARC records to be represented in XML; the fields remain the same, but those fields are expressed in the record in XML markup. Libraries typically expose their records as MARCXML via a web service, often following the SRU or OAI-PMH standards.

Content

MARC encodes information about a bibliographic item, not information about the content of that item; this means it is a metadata transmission standard, not a content standard. The actual content that a cataloger places in each MARC field is usually governed and defined by standards outside of MARC, except for a handful of fixed fields defined by the MARC standards themselves. Resource Description and Access, for example, defines how the physical characteristics of books and other items should be expressed. The Library of Congress Subject Headings (LCSH) are a list of authorized subject terms used to describe the main subject content of the work. Other cataloging rules and classification schedules can also be used.

MARC formats

MARC formats
NameDescription
Authority records provide information about individual names, subjects, and uniform titles. An authority record establishes an authorized form of each heading, with references as appropriate from other forms of the heading.
Bibliographic recordsdescribe the intellectual and physical characteristics of bibliographic resources (books, sound recordings, video recordings, and so forth).
Classification recordsMARC records containing classification data. For example, the Library of Congress Classification has been encoded using the MARC 21 Classification format.
Community Information recordsMARC records describing a service-providing agency, such as a local homeless shelter or tax assistance provider.
Holdings recordsprovide copy-specific information on a library resource (call number, shelf location, volumes held, and so forth).

MARC 21

MARC 21 was designed to redefine the original MARC record format for the 21st century and to make it more accessible to the international community. MARC 21 has formats for the following five types of data: Bibliographic Format, Authority Format, Holdings Format, Community Format, and Classification Data Format. [3] Currently MARC 21 has been implemented successfully by The British Library, the European Institutions and the major library institutions in the United States, and Canada.

MARC 21 is a result of the combination of the United States and Canadian MARC formats (USMARC and CAN/MARC). MARC 21 is based on the NISO/ANSI standard Z39.2, which allows users of different software products to communicate with each other and to exchange data. [3]

MARC 21 allows the use of two character sets, either MARC-8 or Unicode encoded as UTF-8. MARC-8 is based on ISO 2022 and allows the use of Hebrew, Cyrillic, Arabic, Greek, and East Asian scripts. MARC 21 in UTF-8 format allows all the languages supported by Unicode. [4]

MARCXML

MARCXML is an XML schema based on the common MARC 21 standards. [5] MARCXML was developed by the Library of Congress and adopted by it and others as a means of facilitating the sharing of, and networked access to, bibliographic information. [5] Being easy to parse by various systems allows it to be used as an aggregation format, as it is in software packages such as MetaLib, though that package merges it into a wider DTD specification.

The MARCXML primary design goals included: [6]

Future

The future of the MARC formats is a matter of some debate among libraries. On the one hand, the storage formats are quite complex and are based on outdated technology. On the other, there is no alternative bibliographic format with an equivalent degree of granularity. The billions of MARC records in tens of thousands of individual libraries (including over 50,000,000 records belonging to the OCLC consortium alone) create inertia. The Library of Congress has launched the Bibliographic Framework Initiative (BIBFRAME), [7] which aims at providing a replacement for MARC that provides greater granularity and easier re-use of the data expressed in multiple catalogs. [8] Beginning in 2013, OCLC Research exposed data detailing how various MARC elements have been used by libraries in the 400 million MARC records (as of early 2018) contained in WorldCat. [9] The MARC formats are managed by the MARC Steering Group, which is advised by the MARC Advisory Committee. [10] Proposals for changes to MARC are submitted to the MARC Advisory Committee and discussed in public at the American Library Association (ALA) Midwinter and ALA Annual meetings.

See also

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

<span class="mw-page-title-main">Machine-readable medium and data</span> Medium capable of storing data in a format readable by a machine

In communications and computing, a machine-readable medium is a medium capable of storing data in a format easily readable by a digital computer or a sensor. It contrasts with human-readable medium and data.

The International Press Telecommunications Council (IPTC), based in London, United Kingdom, is a consortium of the world's major news agencies, other news providers and news industry vendors and acts as the global standards body of the news media.

<span class="mw-page-title-main">Learning object metadata</span> Data model

Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability, and to facilitate their interoperability, usually in the context of online learning management systems (LMS).

Encoded Archival Description (EAD) is a standard for encoding descriptive information regarding archival records.

<span class="mw-page-title-main">Cataloging (library science)</span> Process of creating meta-data for information resources to include in a catalog database

In library and information science, cataloging (US) or cataloguing (UK) is the process of creating metadata representing information resources, such as books, sound recordings, moving images, etc. Cataloging provides information such as author's names, titles, and subject terms that describe resources, typically through the creation of bibliographic records. The records serve as surrogates for the stored information resources. Since the 1970s these metadata are in machine-readable form and are indexed by information retrieval tools, such as bibliographic databases or search engines. While typically the cataloging process results in the production of library catalogs, it also produces other types of discovery tools for documents and collections.

ISO 2709 is an ISO standard for bibliographic descriptions, titled Information and documentation—Format for information exchange.

The Clinical Data Interchange Standards Consortium (CDISC) is a standards developing organization (SDO) dealing with medical research data linked with healthcare, to "enable information system interoperability to improve medical research and related areas of healthcare". The standards support medical research from protocol through analysis and reporting of results and have been shown to decrease resources needed by 60% overall and 70–90% in the start-up stages when they are implemented at the beginning of the research process.

The Metadata Object Description Schema (MODS) is an XML-based bibliographic description schema developed by the United States Library of Congress' Network Development and Standards Office. MODS was designed as a compromise between the complexity of the MARC format used by libraries and the extreme simplicity of Dublin Core metadata.

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

<span class="mw-page-title-main">NewGenLib</span>

NewGenLib is an integrated library management system developed by Verus Solutions Pvt Ltd. Domain expertise is provided by Kesavan Institute of Information and Knowledge Management in Hyderabad, India. NewGenLib version 1.0 was released in March 2005. On 9 January 2008, NewGenLib was declared free and open-source under GNU GPL. The latest version of NewGenLib is 3.1.1 released on 16 April 2015. Many libraries across the globe are using NewGenLib as their Primary integrated library management system as seen from the NewGenlib discussion forum.

Resource Description and Access (RDA) is a standard for descriptive cataloging initially released in June 2010, providing instructions and guidelines on formulating bibliographic data. Intended for use by libraries and other cultural organizations such as museums and archives, RDA is the successor to Anglo-American Cataloguing Rules, Second Edition (AACR2).

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

A bibliographic record is an entry in a bibliographic index which represents and describes a specific resource. A bibliographic record contains the data elements necessary to help users identify and retrieve that resource, as well as additional supporting information, presented in a formalized bibliographic format. Additional information may support particular database functions such as search, or browse, or may provide fuller presentation of the content item.

BIBFRAME is a data model for bibliographic description. BIBFRAME was designed to replace the MARC standards, and to use linked data principles to make bibliographic data more useful both within and outside the library community.

A machine-readable document is a document whose content can be readily processed by computers. Such documents are distinguished from more general machine-readable data by virtue of having further structure to provide the necessary context to support the business processes for which they are created.

The Maschinelles Austauschformat für Bibliotheken or MAB is a bibliographic data exchange format.

References

  1. Schudel, Matt. "Henriette Avram, 'Mother of MARC,' Dies". Library of Congress. Retrieved June 22, 2013.
  2. McCallum, Sally H. (2002). "MARC: Keystone for Library Automation". IEEE Annals of the History of Computing. 24 (2): 34–49. doi:10.1109/MAHC.2002.1010068.
  3. 1 2 Joudrey and Taylor, Organization of Information, p. 262
  4. "Character Sets: MARC-8 Encoding Environment: MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media (Library of Congress)". loc.gov.
  5. 1 2 "MARC 21 XML Schema". Library of Congress. Retrieved 2013-12-11.
  6. "MARC XML Design Considerations". Loc.gov. 2004-12-30. Retrieved 2013-12-11.
  7. "Bibliographic Framework Initiative". Library of Congress. Retrieved 2 February 2013.
  8. "Bibliographic Framework Initiative Update Forum" (BIBFRAME, Library of Congress). Library of Congress . 2013-11-22. Retrieved 2013-12-11.
  9. "MARC Usage in WorldCat". OCLC Research. 2013. Archived from the original on April 14, 2015. Retrieved April 8, 2015.
  10. "MARC Advisory Committee". Library of Congress. Retrieved January 22, 2018.

Further reading