Persistent identifier

Last updated
An introduction to persistent identifiers and FAIR data.

A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other object.

Contents

The term "persistent identifier" is usually used in the context of digital objects that are accessible over the Internet. Typically, such an identifier is not only persistent but actionable: [1] you can plug it into a web browser and be taken to the identified source.

Of course, the issue of persistent identification predates the Internet. Over centuries, writers and scholars developed standards for citation of paper-based documents so that readers could reliably and efficiently find a source that a writer mentioned in a footnote or bibliography. After the Internet started to become an important source of information in the 1990s, the issue of citation standards became important in the online world as well. Studies have shown that within a few years of being cited, a significant percentage of web addresses go "dead", [2] [3] a process often called link rot. Using a persistent identifier can slow or stop this process.

An important aspect of persistent identifiers is that "persistence is purely a matter of service". [4] That means that persistent identifiers are only persistent to the degree that someone commits to resolving them for users. No identifier can be inherently persistent, however many persistent identifiers are created within institutionally administered systems with the aim to maximise longevity.

However, some regular URLs (i.e. web addresses), maintained by the website owner, are intended to be long-lasting; these are often called permalinks.

Examples

People and organisations:

Publications:

Uniform Resource Identifiers:

Combined persistent identifier and archiving functionality is provided by services such as the Internet Archive perma.cc, archive.today, and WebCite such that anyone can archive a web page to prevent link rot of a URL.[ citation needed ]

Related Research Articles

<span class="mw-page-title-main">Dublin Core</span> Standardized set of metadata elements

The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources. The Dublin Core Metadata Initiative (DCMI) is responsible for formulating the Dublin Core; DCMI is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization.

A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, concepts, or information resources such as web pages and books. Some URIs provide a means of locating and retrieving information resources on a network ; these are Uniform Resource Locators (URLs). A URL provides the location of the resource. A URI identifies the resource by name at the specified location or URL. Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it; these are Uniform Resource Names (URNs). The web technologies that use URIs are not limited to web browsers. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

In computer science, an object can be a variable, a data structure, a function, or a method. As regions of memory, objects contain a value and are referenced by identifiers.

A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable. URNs cannot be used to directly locate an item and need not be resolvable, as they are simply templates that another parser may use to find an item.

<span class="mw-page-title-main">Link rot</span> Phenomenon of URLs tending to cease functioning

Link rot is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer points to its target, often called a broken, dead, or orphaned link, is a specific form of dangling pointer.

<span class="mw-page-title-main">Digital object identifier</span> ISO standard unique string identifier for a digital object

A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications. DOIs have also been used to identify other types of information resources, like commercial videos.

A permalink or permanent link is a URL that is intended to remain unchanged for many years into the future, yielding a hyperlink that is less susceptible to link rot. Permalinks are often rendered simply, that is, as clean URLs, to be easier to type and remember. Most modern blogging and content-syndication software systems support such links. Sometimes URL shortening is used to create them.

REST is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of a distributed, Internet-scale hypermedia system, such as the Web, should behave. The REST architectural style emphasises uniform interfaces, independent deployment of components, the scalability of interactions between them, and creating a layered architecture to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems.

A persistent uniform resource locator (PURL) is a uniform resource locator (URL) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using HTTP status codes.

Web Services Resource Framework (WSRF) is a family of OASIS-published specifications for web services. Major contributors include the Globus Alliance and IBM.

An OpenURL is similar to a web address, but instead of referring to a physical website, it refers to an article, book, patent, or other resource within a website.

A web resource is any identifiable resource present on or connected to the World Wide Web. Resources are identified using Uniform Resource Identifiers (URIs). In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).

A unique identifier (UID) is an identifier that is guaranteed to be unique among all identifiers used for those objects and for a specific purpose. The concept was formalized early in the development of computer science and information systems. In general, it was associated with an atomic data type.

Internet resource locators, described in RFC 1736, convey location and access information for resources. Typical examples of resources include network accessible documents, WAIS databases, FTP servers, and Telnet destinations.

In IETF specifications, a Uniform Resource Characteristic (URC) is a string of characters representing the metadata of a Uniform Resource Identifier (URI), a string identifying a Web resource. URC metadata was envisioned to include sufficient information to support persistent identifiers, such as mapping a Uniform Resource Name (URN) to a current Uniform Resource Locator (URL). URCs were proposed as a specification in the mid-1990s, but were never adopted.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".

<span class="mw-page-title-main">Archival Resource Key</span> Form of URLs used as persistent identifiers

An Archival Resource Key (ARK) is a multi-purpose URL suited to being a persistent identifier for information objects of any type. It is widely used by libraries, data centers, archives, museums, publishers, and government agencies to provide reliable references to scholarly, scientific, and cultural objects. In 2019 it was registered as a Uniform Resource Identifier (URI).

Identifiers.org is a project providing stable and perennial identifiers for data records used in the Life Sciences. The identifiers are provided in the form of Uniform Resource Identifiers (URIs). Identifiers.org is also a resolving system, that relies on collections listed in the MIRIAM Registry to provide direct access to different instances of the identified records.

A Uniform Resource Locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.

References

  1. John A. Kunze, "Towards Electronic Persistence Using ARK Identifiers," section 3, California Digital Library
  2. Sanderson, Robert; Phillips, Mark; Herbert Van de Sompel (2011). "Analyzing the Persistence of Referenced Web Resources with Memento". arXiv: 1105.3459 [cs.DL].
  3. Bugeja, Michael (2010). Vanishing Act: The Erosion of Online Footnotes and Implications for Scholarship in the Digital Age. ISBN   978-1936117147.
  4. Kunze, J. "The ARK Identifier Scheme".
  5. "On constructing persistent identifiers with persistent resolution targets". IEEE Conference Publication. Retrieved 8 April 2018.