LOCKSS

Last updated
LOCKSS
Developer(s) Stanford University
Stable release
1.78.3 [1] / 1 August 2024;2 months ago (1 August 2024)
Repository
License BSD style
Website www.lockss.org

The LOCKSS ("Lots of Copies Keep Stuff Safe") project, under the auspices of Stanford University, is a peer-to-peer network that develops and supports an open source system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. Its main goal is digital preservation.

Contents

The system attempts to replicate the way libraries do this for material published on paper. It was originally designed for scholarly journals, [2] but is now also used for a range of other materials. Examples include the SOLINET project to preserve theses and dissertations at eight universities, [3] US government documents, [4] and the MetaArchive Cooperative program preserving at-risk digital archival collections, including Electronic Theses and Dissertations (ETDs), newspapers, photograph collections, and audio-visual collections. [5] [6]

A similar project called CLOCKSS (Controlled LOCKSS) "is a tax-exempt, 501(c)(3), not-for-profit organization, governed by a Board of Directors made up of librarians and publishers." [7] CLOCKSS runs on LOCKSS technology. [8]

Problem

Traditionally, academic libraries have retained issues of scholarly journals, either individually or collaboratively, providing their readers access to the content received even after the publisher has ceased or the subscription has been canceled. [9] In the digital age, libraries often subscribe to journals that are only available digitally over the Internet. Although convenient for patron access, the model for digital subscriptions does not allow the libraries to retain a copy of the journal. If the publisher ceases to publish, or the library cancels the subscription, or if the publisher's website is down for the day, the content that has been paid for is no longer available.

Methods

The LOCKSS system allows a library, with permission from the publisher, to collect, preserve and disseminate to its patrons a copy of the materials to which it has subscribed as well as open access material (perhaps published under a Creative Commons license). Each library's system collects a copy using a specialized web crawler that verifies that the publisher has granted suitable permission. The system is format-agnostic, collecting whatever formats the publisher delivers via HTTP. Libraries which have collected the same material cooperate in a peer-to-peer network to ensure its preservation. Peers in the network vote on cryptographic hash functions of preserved content and a nonce; a peer that is outvoted regards its copy as damaged and repairs it from the publisher or other peers. [10] [11]

The LOCKSS license used by most publishers allows a library's readers access to its own copy, but does not allow similar access to other libraries or unaffiliated readers; the system does not support file sharing. On request, a library may supply another library with content to effect a repair, but only if the requesting library proved that in the past that it had a good copy by voting with the majority. If the reader's browser no longer supports the format in which the copy was collected, a format migration process can convert it to a current format. [12] These limits on the use that may be made of preserved copies of copyright material have been effective in persuading copyright owners to grant the necessary permission. [13]

The LOCKSS approach of selective collection with permission from the publisher, distributed storage, and restricted dissemination contrasts with, for example, the Internet Archive's approach of omnivorous collection without permission from the publisher, centralized storage, and unrestricted dissemination. The LOCKSS system is far smaller, but it can preserve subscription materials to which the Internet Archive has no access.

Since each library administers its own LOCKSS peer and maintains its own copy of preserved material, and since there are libraries doing so worldwide (see the list of participating libraries below), the system provides a much higher degree of replication than is usual in a fault-tolerant system. The voting process makes use of this high degree of replication to eliminate the need for backups to off-line media, and to provide robust defenses against attacks aimed at corrupting preserved content. [14]

Importance

In addition to preserving access, libraries have traditionally made it difficult to rewrite or suppress printed material. The existence of an indeterminate but large number of identical copies on a somewhat tamper-resistant medium under many independent administrations meant that attempts to alter or remove all copies of a published work would likely both fail and be detected. Web publishing, based on a single copy under a single administration, provides none of these safeguards against subversion. Web publishing is, therefore, an amenable tool for rewriting history. By preserving many copies under diverse administration, by automatically auditing the copies at intervals against each other (and, in the future, against the publisher's copy), and by alerting libraries when changes are detected, the LOCKSS system attempts to restore many of these safeguards in the now digital world of publication.

Implementation

Prior to implementing a LOCKSS system, some questions need to be considered carefully in order to make sure the content is verified, evaluated, and auditable by users. The user must ask questions such as, "What are your procedures?", "What are your methods?", "How is this system evaluated?", and "What is your disaster preparedness program?". These questions will enable the user to evaluate the system, create a successful maintenance plan for their materials, and enable the system to be reinforced by a carefully evaluated support structure.

The source code for the entire LOCKSS system carries BSD-style open-source licenses and is available from GitHub. [15] LOCKSS is a trademark of Stanford University.

See also

Related Research Articles

<span class="mw-page-title-main">Internet Archive</span> American nonprofit digital archive

The Internet Archive is an American nonprofit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized materials including websites, software applications, music, audiovisual, and print materials. The Archive also advocates a free and open Internet. As of September 5, 2024, the Internet Archive held more than 42.1 million print materials, 13 million videos, 1.2 million software programs, 14 million audio files, 5 million images, 272,660 concerts, and over 866 billion web pages in its Wayback Machine. Its mission is committing to provide "universal access to all knowledge".

Electronic publishing includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. It also includes the editing of books, journals, and magazines to be posted on a screen.

<span class="mw-page-title-main">Digitization</span> Converting information into digital form

Digitization is the process of converting information into a digital format. The result is the representation of an object, image, sound, document, or signal obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a digital image, for the object, and digital form, for the signal. In modern practice, the digitized data is in the form of binary numbers, which facilitates processing by digital computers and other operations, but digitizing simply means "the conversion of analog source material into a numerical format"; the decimal or any other number system can be used instead.

<span class="mw-page-title-main">Scientific literature</span> Literary genre

Scientific literature encompasses a vast body of academic papers that spans various disciplines within the natural and social sciences. It primarily consists of academic papers that present original empirical research and theoretical contributions. These papers serve as essential sources of knowledge and are commonly referred to simply as "the literature" within specific research fields.

<span class="mw-page-title-main">Digital obsolescence</span> Data loss as the format goes into disuse

Digital obsolescence is the risk of data loss because of inabilities to access digital assets, due to the hardware or software required for information retrieval being repeatedly replaced by newer devices and systems, resulting in increasingly incompatible formats. While the threat of an eventual "digital dark age" was initially met with little concern until the 1990s, modern digital preservation efforts in the information and archival fields have implemented protocols and strategies such as data migration and technical audits, while the salvage and emulation of antiquated hardware and software address digital obsolescence to limit the potential damage to long-term information access.

PANDORA, or Pandora, is a national web archive for the preservation of Australia's online publications. Established by the National Library of Australia in 1996, it has been built in collaboration with Australian state libraries and cultural collecting organisations, including the Australian Institute of Aboriginal and Torres Strait Islander Studies, the Australian War Memorial, and the National Film and Sound Archive. It is now one of three components of the Australian Web Archive.

In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

Perpetual access is the stated continuous access of licensed electronic material after is it no longer accessible through an active paid subscription either through the library or publisher action. In many cases, the two parties involved in the license agree that it is necessary for the license to retain access to these materials after the license has lapsed. Other terms for perpetual access or similar trains of thought are 'post-cancellation access' and 'continuing access.'

The National Digital Information Infrastructure and Preservation Program (NDIIPP) of the United States was an archival program led by the Library of Congress to archive and provide access to digital resources. The program convened several working groups, administered grant projects, and disseminated information about digital preservation issues. The U.S. Congress established the program in 2000, and official activity specific to NDIIPP itself wound down between 2016 and 2018. The Library was chosen because of its role as one of the leading providers of high-quality content on the Internet. The Library of Congress has formed a national network of partners dedicated to preserving specific types of digital content that is at risk of loss.

Web archiving is the process of collecting, preserving and providing access to material from the World Wide Web. The aim is to ensure that information is preserved in an archival format for research and the public.

<span class="mw-page-title-main">Project Muse</span> Online database of journals and ebooks

Project MUSE, a non-profit collaboration between libraries and publishers, is an online database of peer-reviewed academic journals and electronic books. Project MUSE contains digital humanities and social science content from some 400 university presses and scholarly societies around the world. It is an aggregator of digital versions of academic journals, all of which are free of digital rights management (DRM). It operates as a third-party acquisition service like EBSCO, JSTOR, OverDrive, and ProQuest.

The term born-digital refers to materials that originate in a digital form. This is in contrast to digital reformatting, through which analog materials become digital, as in the case of files created by scanning physical paper records. It is most often used in relation to digital libraries and the issues that go along with said organizations, such as digital preservation and intellectual property. However, as technologies have advanced and spread, the concept of being born-digital has also been discussed in relation to personal consumer-based sectors, with the rise of e-books and evolving digital music. Other terms that might be encountered as synonymous include "natively digital", "digital-first", and "digital-exclusive".

Preservation metadata is item level information that describes the context and structure of a digital object. It provides background details pertaining to a digital object's provenance, authenticity, and environment. Preservation metadata, is a specific type of metadata that works to maintain a digital object's viability while ensuring continued access by providing contextual information, usage details, and rights.

Digital curation is the selection, preservation, maintenance, collection, and archiving of digital assets. Digital curation establishes, maintains, and adds value to repositories of digital data for present and future use. This is often accomplished by archivists, librarians, scientists, historians, and scholars. Enterprises are starting to use digital curation to improve the quality of information and data within their operational and strategic processes. Successful digital curation will mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes digital asset management, data curation, digital preservation, and electronic records management.

<span class="mw-page-title-main">Digital library</span> Online database of digital objects stored in electronic media formats and accessible via computers

A digital library is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.

Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics of the data.

ZooKeys is a peer-reviewed open access scientific journal covering zoological taxonomy, phylogeny, and biogeography. It was established in 2008 and the founding editor-in-chief was Terry Erwin until his death in 2020. In December 2023, Torsten Dikow was appointed the new editor-in-chief. It is published by Pensoft Publishers.

<i>PeerJ</i> Academic journal

PeerJ is an open access peer-reviewed scientific mega journal covering research in the biological and medical sciences. It officially launched in June 2012, started accepting submissions on December 3, 2012, and published its first articles on February 12, 2013.

National edeposit (NED) is a collaboration between Australia's nine national, state and territory libraries which provides for the legal deposit, management, storage and preservation of, and access to, published electronic material across Australia. It is a website, a system and a service, the result of a project by National and State Libraries Australia, and is a world-first collaboration. The National Library of Australia (NLA), Libraries ACT, Libraries Tasmania, Northern Territory Library, State Library of New South Wales, State Library of Queensland, State Library of South Australia, State Library Victoria and the State Library of Western Australia are the member organisations, while the system is hosted and managed by the NLA.

References

  1. "Release 1.78.3". 1 August 2024. Retrieved 22 August 2024.
  2. David S. H. Rosenthal; Vicky Reich (June 18, 2000). Permanent Web Publishing (PDF). Proceedings of FREENIX Track: 2000 USENIX Annual Technical Conference. Retrieved 2008-01-19.
  3. "ASERL and LOCKSS to Preserve e-Theses & Dissertations" (Press release). SOLINET. July 11, 2005. Retrieved 2008-01-19.
  4. Jacobs, James. "LOCKSS-USDOCS". home page. LOCKSS. Retrieved 23 February 2012.[ permanent dead link ]
  5. "The MetaArchive Cooperative". Home page. Retrieved 2008-01-19.
  6. Maniatis, Petros; Rosenthal, David S. H.; Roussopoulos, Mema; Baker, Mary; Giuli, TJ; Muliadi, Yanto (2003). "Preserving peer replicas by rate-limited sampled voting". Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP '03 (PDF). p. 44. arXiv: cs/0303026 . doi:10.1145/945445.945451. ISBN   978-1581137576. S2CID   215753435. Free PDF download.
  7. CLOCKSS.org (2015), CLOCKSS.org.
  8. CLOCKSS.org (2008-02-14), What's the difference between LOCKSS and CLOCKSS? (PDF), retrieved 2015-11-21.
  9. "Preservation Principles - LOCKSS". Archived from the original on 2018-12-01. Retrieved 2013-03-07.
  10. Petros Maniatas; Mema Roussopoulos; TJ Giuli; David S. H. Rosenthal; Mary Baker; Yanto Muliadi (October 19, 2003). "Preserving Peer Replicas By Rate-Limited Sampled Voting" (PDF). ACM Symposium on Operating Systems Principles. Archived from the original (PDF) on January 28, 2005. Retrieved May 2, 2022.
  11. T.J. Giuli; Petros Maniatis; Mary Baker; David S. H. Rosenthal; Mema Roussopoulos (November 27, 2004). "Attrition Defenses for a Peer-to-Peer Digital Preservation System". arXiv: cs.CR/0405111 .
  12. David S. H. Rosenthal; Thomas Lipkis; Thomas S. Robertson; Seth Morabito (January 2005). "Transparent Format Migration of Preserved Web Content". D-Lib Magazine. 11 (1). arXiv: cs/0411077 . doi: 10.1045/january2005-rosenthal . S2CID   474 . Retrieved 2008-01-19.
  13. "Publishers and Titles". LOCKSS. Archived from the original on 2008-01-08. Retrieved 2008-01-19.
  14. David S. H. Rosenthal & Daniel L. Vargas (September 11, 2012). "LOCKSS Boxes in the Cloud" (PDF). Retrieved October 11, 2013.
  15. "LOCKSS (Lots of Copies Keep Stuff Safe)". Project web site. GitHub. Retrieved September 12, 2017.
  16. Rosenthal, David S. H.; Reich, Vicky (2000-06-18). "Permanent web publishing" (PDF). Proceedings of the Annual Conference on USENIX Annual Technical Conference. ATEC '00. USA: USENIX Association: 40. Archived from the original on 18 June 2000. The project was to a large extent inspired by Danny Hillis & Stewart Brand's Millennium Clock project[clock].

Further reading