Archival Resource Key

Last updated
Archival Resource Key
ARK logo.png
AcronymARK
OrganisationARK Alliance
Introduced2001 (2001)
No. issued8.2 billion
No. of digitsvariable
Check digit NCDA, optional
Example ark:/53355/cl010066723
Website arks.org

An Archival Resource Key (ARK) is a multi-purpose URL suited to being a persistent identifier for information objects of any type. It is widely used by libraries, data centers, archives, museums, publishers, and government agencies to provide reliable references to scholarly, scientific, and cultural objects. In 2019 it was registered as a Uniform Resource Identifier (URI) scheme. [1]

Contents

A URL that is an ARK is distinguished by the label ark: after the URL's hostname, which sets the expectation that, when submitted to a web browser, the URL terminated by '?' returns a brief metadata record, and the URL terminated by '??' returns metadata that includes a commitment statement from the current service provider.[ incomprehensible ] The ARK and its inflections ('?' and '??') provide access to three facets of a provider's ability to provide persistence.

Implicit in the design of the ARK scheme is that persistence is purely a matter of service and not a property of a naming syntax. Moreover, that a "persistent identifier" cannot be born persistent, but an identifier from any scheme may only be proved persistent over time. The inflections provide information with which to judge an identifier's likelihood of persistence.

History

Throughout the 1990s, the Internet Engineering Task Force and other organizations developed standards for persistent identifiers for web resources, including URN, PURL, Handle, and DOI. In each of these standards, indirect identifiers would resolve to URLs, which themselves changed over time. Many believed that such systems would contribute to the persistence of web resources over time. [2]

In 2001, John Kunze of the University of California and R. P. Channing Rodgers of the United States National Library of Medicine released the first draft of “The ARK Persistent Identifier Scheme,” designed in response to the needs of their two organizations, as an IETF working document. [3] In explaining their motivations for creating a new system, Kunze later wrote that “each [persistent identifier] system had specific problems.” In contrast to the decentralized structure of the web, with many independent publishers, Handle and DOI were related centralized systems which charged for inclusion; they were “antithetical,” according to Kunze, “to an implicit principle that Internet standards must not endorse control by any one entity, over access to the networked resources of another entity.” URNs were free, but lacked a resolver discovery services, and, wrote Kunze, “it seemed to me that the IETF community lost interest in creating a whole new Internet indirection infrastructure that would add little to existing web and DNS mechanisms, especially in light of the small part that indirection plays in keeping links from breaking.” [2]

In contrast to these other systems, the ARK scheme proposed that “persistence is purely a matter of service,… neither inherent in an object nor conferred on it by a particular naming syntax.” The most an identifier could do to solve the problem of persistence, then, was to indicate an organization’s commitment. Accordingly, in the ARK standard, identifiers would refer not only to a web resource, but also to “a promise of stewardship” and metadata about the resource. If a web server was queried with an ARK, it should return the resource itself or some surrogate for it, such as “a table of contents instead of a large complex document.” If a question mark was appended to the ARK, though, it should return a description—metadata—instead, which “must at minimum answer the who, what, when, and why questions concern an expression of the object.” (The scheme also included a guide to Electronic Resource Citations, a simple format for structuring this metadata.) If two question marks were appended, the server should return the provider’s policies regarding “object persistence, object naming, object fragment addressing, and operational service support.” [3]

California Digital Library began using ARKs in 2002, and released the Noid (Nice Opaque IDentifiers) software for managing ARKs and other identifiers in 2004. Other early adopters of ARKs included Portico, the Internet Archive, and the Bibliothèque nationale de France, the first of several francophone institutions to adopt the scheme.

In 2018, the California Digital Library and DuraSpace announced a collaboration, initially named ARKs-in-the-Open and then the ARK Alliance, to build an international community around ARKs and their use in open scholarship. By 2021, over 800 institutions registered to use ARKs. [2]

Structure

https://NMA/ark:/NAAN/Name[Qualifier]

A complete NAAN registry [6] is maintained by the ARK Alliance and replicated at the Bibliothèque Nationale de France and the US National Library of Medicine. It contained 530 entries in June 2018, 633 in July 2020, and 754 in April 2021.

Application

ARKs may be assigned to anything digital, physical, or abstract. Below are examples, as reported (2020) to the ARK Alliance by the linked organizations.

Generic services

Three generic ARK services have been defined. They are described below in protocol-independent terms. Delivering these services may be implemented through many possible methods given available technology (today's or future).

Access service (access, location)

Policy service (permanence, naming, etc.)

Description service

See also

Notes and references

  1. "Uniform Resource Identifier (URI) Schemes".
  2. 1 2 3 Meyerl, Jordan (September 14, 2021). "ARK Alliance: An Interview with John Kunze". bloggERS. Society of American Archivists Electronic Resources Section.
  3. 1 2 Kunze, J.; Rodgers, R. P. C. (March 8, 2001). "The ARK Persistent Identifier Scheme". IETF Datatracker. Internet Engineering Task Force.
  4. Hierarchy qualifiers begin with a slash character.
  5. Variant qualifiers begin with a dot character.
  6. Name Assigning Authority Number registry

Related Research Articles

<span class="mw-page-title-main">Dublin Core</span> Standardized set of metadata elements

The Dublin Core vocabulary, also known as the Dublin Core Metadata Terms (DCMT), is a general purpose metadata vocabulary for describing resources of any type. It was first developed for describing web content in the early days of the World Wide Web. The Dublin Core Metadata Initiative (DCMI) is responsible for maintaining the Dublin Core vocabulary.

A Uniform Resource Identifier (URI), formerly Universal Resource Identifier, is a unique sequence of characters that identifies an abstract or physical resource, such as resources on a webpage, mail address, phone number, books, real-world objects such as people and places, concepts. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable. URNs cannot be used to directly locate an item and need not be resolvable, as they are simply templates that another parser may use to find an item.

<span class="mw-page-title-main">Digital object identifier</span> ISO standard unique string identifier for a digital object

A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

<span class="mw-page-title-main">Magnet URI scheme</span> Scheme that defines the format of magnet links

Magnet is a URI scheme that defines the format of magnet links, a de facto standard for identifying files (URN) by their content, via cryptographic hash value rather than by their location.

A persistent uniform resource locator (PURL) is a uniform resource locator (URL) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using HTTP status codes.

Webcal is a uniform resource identifier (URI) scheme for accessing iCalendar files. WebCal allows you to create and maintain an interactive events calendar or scheduling system on a Web site or app.

A web resource is any identifiable resource present on or connected to the World Wide Web. Resources are identified using Uniform Resource Identifiers (URIs). In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).

Life Science Identifiers are a way to name and locate pieces of information on the web. Essentially, an LSID is a unique identifier for some data, and the LSID protocol specifies a standard way to locate the data. They are a little like DOIs used by many publishers.

<span class="mw-page-title-main">Michael Mealling</span>

Michael Mealling is co-founder of Pipefish Inc, and was the cofounder, Chief Financial Officer (CFO) and Vice President of Business Development of Masten Space Systems, CEO of Refactored Networks, long time participant within the IETF, a Space Frontier Foundation Advocate, and a former Director of the Moon Society. He operates a blog site called Rocketforge and has been interviewed twice on The Space Show and twice on SpaceVidcast.

In IETF specifications, a Uniform Resource Characteristic (URC) is a string of characters representing the metadata of a Uniform Resource Identifier (URI), a string identifying a Web resource. URC metadata was envisioned to include sufficient information to support persistent identifiers, such as mapping a Uniform Resource Name (URN) to a current Uniform Resource Locator (URL). URCs were proposed as a specification in the mid-1990s, but were never adopted.

Security Assertion Markup Language 2.0 (SAML 2.0) is a version of the SAML standard for exchanging authentication and authorization identities between security domains. SAML 2.0 is an XML-based protocol that uses security tokens containing assertions to pass information about a principal between a SAML authority, named an Identity Provider, and a SAML consumer, named a Service Provider. SAML 2.0 enables web-based, cross-domain single sign-on (SSO), which helps reduce the administrative overhead of distributing multiple authentication tokens to the user. SAML 2.0 was ratified as an OASIS Standard in March 2005, replacing SAML 1.1. The critical aspects of SAML 2.0 are covered in detail in the official documents SAMLCore, SAMLBind, SAMLProf, and SAMLMeta.

The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".

An Extensible Resource Identifier (XRI) is a scheme and resolution protocol for abstract identifiers compatible with Uniform Resource Identifiers (URI) and Internationalized Resource Identifiers (IRI), developed by the XRI Technical Committee at OASIS. The goal of XRI was a standard syntax and discovery format for abstract, structured identifiers that are domain-, location-, application-, and transport-independent, so they can be shared across any number of domains, directories, and interaction protocols.

<span class="mw-page-title-main">Persistent identifier</span> Long-lasting digital name

A persistent identifier is a long-lasting reference to a document, file, web page, or other object.

A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.

<span class="mw-page-title-main">Well-known URI</span>

A well-known URI is a Uniform Resource Identifier for URL path prefixes that start with /.well-known/. They are implemented in webservers so that requests to the servers for well-known services or information are available at URLs consistent well-known locations across servers.

<span class="mw-page-title-main">Thing Description</span>

The Thing Description (TD) (or W3C WoT Thing Description (TD)) is a royalty-free, open information model with a JSON based representation format for the Internet of Things (IoT). A TD provides a unified way to describe the capabilities of an IoT device or service with its offered data model and functions, protocol usage, and further metadata. Using Thing Descriptions help reduce the complexity of integrating IoT devices and their capabilities into IoT applications.