International Internet Preservation Consortium

Web Curator Tool
Original author(s)	National Library of New Zealand / British Library
Developer(s)	Oakleigh Consulting
Initial release	September 2006;17 years ago
Stable release	1.6.1 / May 9, 2014;9 years ago
Platform	Java
Type	Selective web harvesting
License	Apache License V2.0
Website	webcuratortool.org ;

International Internet Preservation Consortium
Abbreviation	IIPC
Formation	July 2003;20 years ago
Purpose	Acquire, preserve and make accessible knowledge and information from the Internet for future generations everywhere, promoting global exchange and international relations.
Website	http://netpreserve.org/

Last updated May 03, 2024

The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future.^[2] It was founded in July 2003 by 12 participating institutions,^[1] and had grown to 35 members by January 2010.^[3] As of January 2022, there are 52 members.

Members

National libraries

Participating national libraries and archives include:^[7]

Participating organisations

Other participating organizations include:^[7]

Archiefweb.eu
Arquivo.pt - Portuguese Web Archive
Columbia University Libraries
Hanzo
Harvard Library
Internet Archive
Institut national de l'audiovisuel
Los Alamos National Laboratory Research Library
Netherlands Institute for Sound and Vision
Old Dominion University Department of Computer Science
Stanford University Libraries
University Library, Bratislava
University of North Texas Libraries

Past members

WebCite used to be, but is no longer, a member of the IIPC.^[8] In a 2012 message, its founder Gunther Eysenbach commented that "WebCite has no funding, and IIPC charges 4000 Euro/yr in membership fees."^[9]

Projects

The IIPC sponsors and collaborates on a number of different projects with its member organizations.

Current projects

Support for transitioning to pywb (Python Wayback).^[10]
Collaborative Collections: IIPC members are collaborating to build public web archive collections based on transnational themes or events of mutual interest. Topics of existing collections include: European Refugee Crisis, Intergovernmental Organizations, Olympics, World War I Commemoration, Climate Change, Artificial Intelligence, and Novel Coronavirus (COVID-19).^[11]
Memento: aggregate metadata of the IIPC archives and provide access to Memento.^[12]

IIPC also maintains an electronic mailing list open to anyone interested in issues associated with web harvesting, archiving, and quality maintenance issues.^[13]

Past projects

Developing Bloom Filters for Web Archives’ Holdings.^[14]
Improving the Dark and Stormy Archives Framework by Summarizing the Collections of the National Library of Australia ^[15]
LinkGate: Core Functionality and Future Use Cases.^[16]
Asking questions with web archives – introductory notebooks for historians: The project output is a set of 16 Jupyter notebooks that demonstrate how specific historical research questions can be explored by analysing data from web archives.^[17]^[18]^[19]
IIPC sponsored a project on "cross-archival search strategies" which included the creation of an archive focused on the 2010 Winter Olympics.^[20]
Starting in 2006, the National Library of New Zealand and the British Library developed the Web Curator Tool, an open-source workflow management system for selective web archiving.^[21] Version 1.6 was released on December 5, 2012, and is available at SourceForge.^[22] The Web Curator Tool is built upon Java technologies such as Apache Tomcat, the Spring Framework and Hibernate, and Internet Archives technologies such as the Heritrix web archiving crawler, the NutchWAX web archive full-text search engine and the Wayback Machine.^[23]
IIPC Web Archiving Doctoral Support Award: grant to provide three years of funding for a student to earn a PhD in Interdisciplinary Information Science at The University of North Texas College of Information.^[24]
IIPC Member Staff Exchange: onsite training by experts for participating IIPC members to use Heritrix 3 web crawler.^[25]
Working group on Statistics and Quality Indicators for Web Archiving: development of guidelines on the management and evaluation of Web archiving activities and products.^[26]

Related Research Articles

The Internet Archive is an American nonprofit digital library founded in 1996 by Brewster Kahle. It provides free access to collections of digitized materials including websites, software applications, music, audiovisual and print materials. The Archive also advocates for a free and open Internet. As of February 4, 2024, the Internet Archive holds more than 44 million print materials, 10.6 million videos, 1 million software programs, 15 million audio files, 4.8 million images, 255,000 concerts, and over 835 billion web pages in its Wayback Machine. Its mission is committing to provide "universal access to all knowledge".

An archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located.

Landsbókasafn Íslands – Háskólabókasafn is the national library of Iceland which also functions as the university library of the University of Iceland. The library was established on December 1, 1994, in Reykjavík, Iceland, with the merger of the former national library, Landsbókasafn Íslands, and the university library. It is the largest library in Iceland with about one million items in various collections. The library's largest collection is the national collection containing almost all written works published in Iceland and items related to Iceland published elsewhere. The library is the main legal deposit library in Iceland. The library also has a large manuscript collection with mostly early modern and modern manuscripts, and a collection of published Icelandic music and other audio. The library houses the largest academic collection in Iceland, most of which can be borrowed for off-site use by holders of library cards. University students get library cards for free, but anyone can acquire a card for a small fee. The library is open for public access.

<span class="mw-page-title-main">UK Web Archive</span> Project to archive UK websites

The UK Web Archive is a consortium of the six UK legal deposit libraries which aims to collect all UK websites at least once each year.

Heritrix is a web crawler designed for web archiving. It was written by the Internet Archive. It is available under a free software license and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls.

The National Digital Information Infrastructure and Preservation Program (NDIIPP) of the United States was an archival program led by the Library of Congress to archive and provide access to digital resources. The program convened several working groups, administered grant projects, and disseminated information about digital preservation issues. The U.S. Congress established the program in 2000, and official activity specific to NDIIPP itself wound down between 2016 and 2018. The Library was chosen because of its role as one of the leading providers of high-quality content on the Internet. The Library of Congress has formed a national network of partners dedicated to preserving specific types of digital content that is at risk of loss.

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.

WebCite is an on-demand archive site, designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or a scholar cited or quoted from it. The preservation service enabled verifiability of claims supported by the cited sources even when the original web pages are being revised, removed, or disappear for other reasons, an effect known as link rot.

The conservation and restoration of new media art is the study and practice of techniques for sustaining new media art created using from materials such as digital, biological, performative, and other variable media.

<span class="mw-page-title-main">IPython</span> Advanced interactive shell for Python

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:

The Digital Preservation Award is an international award sponsored by the Digital Preservation Coalition. The award 'recognises the many new initiatives being undertaken in the challenging field of digital preservation'. It was inaugurated in 2004 and was initially presented as part of the Institute of ConservationConservation Awards. Since 2012 the prize, which includes a trophy and a cheque, is presented independently. Awards ceremonies have taken place at the British Library, the British Museum and the Wellcome Trust.

PADICAT acronym for Patrimoni Digital de Catalunya, in Catalan; or Digital Heritage of Catalonia, in English, is the Web Archive of Catalonia.

Webarchiv is a digital archive of important Czech web resources, which are collected with the aim of their long-term preservation.

Trove is an Australian online library database owned by the National Library of Australia in which it holds partnerships with source providers National and State Libraries Australia, an aggregator and service which includes full text documents, digital images, bibliographic and holdings data of items which are not available digitally, and a free faceted-search engine as a discovery tool.

The German National Library of Science and Technology, abbreviated TIB, is the national library of the Federal Republic of Germany for all fields of engineering, technology, and the natural sciences. It is jointly funded by the Federal Ministry of Education and Research (BMBF) and the 16 German states. Founded in 1959, the library operates in conjunction with the Leibniz Universität Hannover. In addition to acquiring scientific literature, it conducts applied research in such areas as the archiving of non-textual materials, data visualization and the future Internet. The library is also involved in a number of open access initiatives. With a collection of about 8.9 million items in 2012, the TIB is the largest technology and natural science library in the world.

The Internet Memory Foundation was a non-profitable foundation whose purpose was archiving content of the World Wide Web. It supported projects and research that included the preservation and protection of digital media content in various forms to form a digital library of cultural content. As of August 2018, it is defunct.

Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.

A collective collection, shared collection, or shared print program is a joint effort by multiple academic or research libraries to house, manage, and provide access to their collective physical collections. Most shared print programs focus on collections of monographs and/or serials. Similar efforts have addressed acquisition and/or retention of microform, federal government documents, and digital collections. Shared print programs often have activities in common with national repositories and archiving programs. Discussions surrounding shared print programs in their current form have come to the forefront as a popular solution to shrinking collection budgets, rising costs of resources, and competing space needs.

References

1 2 3 "Mission & Goals | IIPC". www.netpreserve.org. International Internet Preservation Consortium. Archived from the original on 2017-06-06. Retrieved 2015-09-12.
↑ "International Internet Preservation Consortium" (Press release). International Internet Preservation Consortium. May 5, 2004. Archived from the original on May 1, 2012.
↑ "Web Archives Registry Launched". News & Events. Library of Congress. January 29, 2010. Archived from the original on April 8, 2011. Retrieved 2011-04-17.
↑ Hiiragi, Wasuke; Shigeo Sugimoto; Tetsuo Sakaguchi. "Web archiving in the world - International Internet Preservation Consortium (IIPC) and their activities". The Journal of Information Science and Technology Association. 58 (8). Japan.
↑ "Web Curator Tool". SourceForge.net. Archived from the original on 13 October 2014. Retrieved 25 February 2021.
↑ "Web Curator Tool". sourceforge.net. Archived from the original on 13 November 2014. Retrieved 25 February 2021.
1 2 "Members". International Internet Preservation Consortium. 2020.
↑ "WebCite Consortium FAQ". webcitation.org. WebCite. Archived from the original on 2008-08-28.
↑ "Twitter post". 2012-06-11. Archived from the original on 2014-01-07. Retrieved 2013-03-10.
↑ "Support for transitioning to pywb". International Internet Preservation Consortium. Retrieved 24 January 2021.
↑ "Collaborative Collections". International Internet Preservation Consortium. Retrieved 24 January 2021.
↑ "Memento". International Internet Preservation Consortium. Retrieved 17 March 2014.
↑ "Web Curators Mailing List". International Internet Preservation Consortium. Archived from the original on 2014-01-25. Retrieved 2017-10-17.
↑ "Developing Bloom Filters for Web Archives' Holdings". International Internet Preservation Consortium. Retrieved 24 January 2021.
↑ "Improving the Dark and Stormy Archives Framework by Summarizing the Collections of the National Library of Australia". International Internet Preservation Consortium. Retrieved 24 January 2021.
↑ "LinkGate: Core Functionality and Future Use Cases". International Internet Preservation Consortium. Retrieved 24 January 2021.
↑ "Asking questions with web archives – introductory notebooks for historians". International Internet Preservation Consortium. Retrieved 24 January 2021.
↑ "Web Archives". GLAM Workbench. Retrieved 24 January 2021.
↑ "IIPC RSS webinar: Tim Sherratt: Jupyter notebooks for web archives". International Internet Preservation Consortium. Archived from the original on 2021-12-20. Retrieved 24 January 2021.
↑ "2010 Winter Olympics". California Digital Library. 2010. Archived from the original on 2011-09-02.
↑ "Web Curator Tool". National Library of New Zealand. Archived from the original on 2011-05-22. Retrieved 2011-04-17.
↑ "The Web Curator Tool Release History". SourceForge. Archived from the original on 2013-02-27. Retrieved 2013-03-10.
↑ "British Library - Developing Enhancements to the Web Curator Tool". Oakleigh Consulting. Retrieved 2011-04-17.
↑ "PhD Sponsorship". International Internet Preservation Consortium. Archived from the original on 17 October 2014. Retrieved 17 March 2014.
↑ "Staff Exchange". International Internet Preservation Consortium. Archived from the original on 7 November 2014. Retrieved 17 March 2014.
↑ "Statistics and Quality Indicators for Web Archiving". International Internet Preservation Consortium. Archived from the original on 7 November 2014. Retrieved 17 March 2014.

External links

International Internet Preservation Consortium

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[mission-1] 1 2 3 "Mission & Goals | IIPC". www.netpreserve.org. International Internet Preservation Consortium. Archived from the original on 2017-06-06. Retrieved 2015-09-12.

[2] "International Internet Preservation Consortium" (Press release). International Internet Preservation Consortium. May 5, 2004. Archived from the original on May 1, 2012.

[3] "Web Archives Registry Launched". News & Events. Library of Congress. January 29, 2010. Archived from the original on April 8, 2011. Retrieved 2011-04-17.

[4] Hiiragi, Wasuke; Shigeo Sugimoto; Tetsuo Sakaguchi. "Web archiving in the world - International Internet Preservation Consortium (IIPC) and their activities". The Journal of Information Science and Technology Association. 58 (8). Japan.

[5] "Web Curator Tool". SourceForge.net. Archived from the original on 13 October 2014. Retrieved 25 February 2021.

[6] "Web Curator Tool". sourceforge.net. Archived from the original on 13 November 2014. Retrieved 25 February 2021.

[members-7] 1 2 "Members". International Internet Preservation Consortium. 2020.

[faq-8] "WebCite Consortium FAQ". webcitation.org. WebCite. Archived from the original on 2008-08-28.

[9] "Twitter post". 2012-06-11. Archived from the original on 2014-01-07. Retrieved 2013-03-10.

[10] "Support for transitioning to pywb". International Internet Preservation Consortium. Retrieved 24 January 2021.

[11] "Collaborative Collections". International Internet Preservation Consortium. Retrieved 24 January 2021.

[12] "Memento". International Internet Preservation Consortium. Retrieved 17 March 2014.

[13] "Web Curators Mailing List". International Internet Preservation Consortium. Archived from the original on 2014-01-25. Retrieved 2017-10-17.

[14] "Developing Bloom Filters for Web Archives' Holdings". International Internet Preservation Consortium. Retrieved 24 January 2021.

[15] "Improving the Dark and Stormy Archives Framework by Summarizing the Collections of the National Library of Australia". International Internet Preservation Consortium. Retrieved 24 January 2021.

[16] "LinkGate: Core Functionality and Future Use Cases". International Internet Preservation Consortium. Retrieved 24 January 2021.

[17] "Asking questions with web archives – introductory notebooks for historians". International Internet Preservation Consortium. Retrieved 24 January 2021.

[18] "Web Archives". GLAM Workbench. Retrieved 24 January 2021.

[19] "IIPC RSS webinar: Tim Sherratt: Jupyter notebooks for web archives". International Internet Preservation Consortium. Archived from the original on 2021-12-20. Retrieved 24 January 2021.

[20] "2010 Winter Olympics". California Digital Library. 2010. Archived from the original on 2011-09-02.

[21] "Web Curator Tool". National Library of New Zealand. Archived from the original on 2011-05-22. Retrieved 2011-04-17.

[22] "The Web Curator Tool Release History". SourceForge. Archived from the original on 2013-02-27. Retrieved 2013-03-10.

[23] "British Library - Developing Enhancements to the Web Curator Tool". Oakleigh Consulting. Retrieved 2011-04-17.

[24] "PhD Sponsorship". International Internet Preservation Consortium. Archived from the original on 17 October 2014. Retrieved 17 March 2014.

[25] "Staff Exchange". International Internet Preservation Consortium. Archived from the original on 7 November 2014. Retrieved 17 March 2014.

[26] "Statistics and Quality Indicators for Web Archiving". International Internet Preservation Consortium. Archived from the original on 7 November 2014. Retrieved 17 March 2014.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

Authority control databases
International	VIAF
National	United States Czech Republic