The UK Web Archive is a consortium of the six UK legal deposit libraries which aims to collect all UK websites at least once each year. [1]
UK Web Archive | |
---|---|
Established | 2005 |
Reference to legal mandate | Yes, provided in law by: |
Website | Official website |
In 2005, the British Library, The National Archives, Wellcome Trust, National Library of Scotland, National Library of Wales and JISC formed the UK Web Archiving Consortium, a project to archive websites. [3]
UKWAC archived selected websites by licence or permission, using PANDAS software developed by the National Library of Australia. During the project its members collected sites relevant to their interest; the Wellcome Library collected medical sites, the national libraries sites that reflect life in contemporary Wales or Scotland. The British Library worked with a broad policy of collecting sites of cultural, historical and political importance to the UK. [4]
The Consortium wound up in 2010. The Archiving and Preservation Working Group took over UKWAC's co-ordinating role web archiving in the UK. The Digital Preservation Coalition hosts the working group. [5]
The archive undertakes an annual crawl of .uk and other UK geographic Top Level Domains such as .scot, .cymru or .london.
The crawl is archived in a shared infrastructure called the Digital Library System. Members of the public can nominate sites for preservation there through the UKWA website. The whole web archive is available to registered readers on library premises; and where permission has been given, or license conditions can be met, copies are also accessible through the website. [6]
The archive gathers sites in response to events, building collections - these have preserved writing and imagery recording natural disasters, election campaigns since 2005 and the UK's blogosphere for research, among more than a hundred more. [7]
The UK Web Archive holds a collection of all the .uk websites that were archived by the Internet Archive until the end of March in 2013. [8] SHINE is a web interface which can be used to create repeatable lists of results of historical .uk pages. Trends, or occurrences of keywords in the data set on .uk pages over that time, use concordance to show keywords in context. [9]
Memento is a name for prior versions of web pages coined by the Memento Project. The UK Web Archive Memento interface allows Mementos to be found across web archives. [10] The interface can be used to find a Memento by its date in a snapshot table, or see how often a site appears across public web archives.
Research into the web as a reflection of society has helped develop access to the archive. [11] Libraries have developed guides to research skills needed to use web archives. These include using big data to see patterns or trends, [12] or writing citations for archived copies of websites. [13]
GLAM Workbench is a project which looks at how researchers can use data preserved by galleries, libraries, archives and museums. [14] It includes a collection of Jupyter notebooks which draw on Mementos and index data. [15] The notebooks mix description and editable code to help researchers find evidence in web archives.
Where the whole archive can be accessed, by Library | |||||
---|---|---|---|---|---|
Bodleian Libraries | British Library | Cambridge University Libraries | National Library of Scotland | National Library of Wales | Trinity College Dublin |
The Internet Archive is an American digital library founded on May 10, 1996, and chaired by free information advocate Brewster Kahle. It provides free access to collections of digitized materials like websites, software applications, music, audiovisual and print materials. The Archive is also an activist organization, advocating a free and open Internet. As of January 1, 2023, the Internet Archive holds more than 36 million print materials, 11.6 million pieces of audiovisual content, 2.5 million software programs, 15 million audio files, 4.5 million images, 251,000 concerts and over 808 billion web pages in its Wayback Machine.
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives. An implementation of OAI-PMH must support representing metadata in Dublin Core, but may also support additional representations.
Digital obsolescence is the risk of data loss because of inabilities to access digital assets, due to the hardware or software required for information retrieval being repeatedly replaced by newer devices and systems, resulting in increasingly incompatible formats. While the threat of an eventual "digital dark age" was initially met with little concern until the 1990s, modern digital preservation efforts in the information and archival fields have implemented protocols and strategies such as data migration and technical audits, while the salvage and emulation of antiquated hardware and software address digital obsolescence to limit the potential damage to long-term information access.
In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies, and it combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. The Association for Library Collections and Technical Services Preservation and Reformatting Section of the American Library Association, defined digital preservation as combination of "policies, strategies and actions that ensure access to digital content over time." According to the Harrod's Librarian Glossary, digital preservation is the method of keeping digital material alive so that they remain usable as technological advances render original hardware and software specification obsolete.
The Digital Preservation Coalition (DPC) is a UK-based non-profit that works with global partners to provide the necessary resources to educate various public and private entities on the best practices for long term digital preservation.
Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.
The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the United Kingdom.
Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench.
The Digital Preservation Award is an international award sponsored by the Digital Preservation Coalition. The award 'recognises the many new initiatives being undertaken in the challenging field of digital preservation'. It was inaugurated in 2004. It was initially presented as part of the Institute of ConservationConservation Awards. Since 2012 the prize is presented independently. The prize includes a trophy and a cheque. Awards ceremonies have taken place at the British Library, the British Museum and the Wellcome Trust.
The British Library Preservation Advisory Centre was established as the National Preservation Office by the British Library Board in 1984, and was renamed to the British Library Preservation Advisory Centre in 2009.
Rhizome is an American not-for-profit arts organization that supports and provides a platform for new media art.
The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future. It was founded in July 2003 by 12 participating institutions, and had grown to 35 members by January 2010. As of January 2022, there are 52 members.
Trove is an Australian online library database owned by the National Library of Australia in which it holds partnerships with source providers National and State Libraries Australia, an aggregator and service which includes full text documents, digital images, bibliographic and holdings data of items which are not available digitally, and a free faceted-search engine as a discovery tool.
BlackPast.org is a web-based reference center that is dedicated primarily to the understanding of African-American history and Afro-Caribbean history and the history of people of Sub-Saharan African ancestry. In 2011 the American Library Association's Reference and User Services Association included it in its list of the 25 Best Free Reference Websites of the Year. According to BlackPast.org, the website has a global audience of about two million visitors per year from over 100 nations. In 2009, Canada, Australia, Great Britain, Brazil, and Germany ranked as the top five countries in visitors to the site after the United States. A 2008 website review described it as easily navigable and well organized but also as containing omissions among some features and as a work in progress. By 2009, the organization was selected by New York Public Library reference librarians as one of the top 25 hybrid print and electronic resources for the year.
The Internet Memory Foundation was a non-profitable foundation whose purpose was archiving content of the World Wide Web. It supported projects and research that included the preservation and protection of digital media content in various forms to form a digital library of cultural content. As of August 2018, it was defunct.
The UK Government Web Archive (UKGWA) is part of The National Archives of the United Kingdom. The National Archives collects records from all UK government departments and bodies creating records defined as Public Records under the British Public Records Act. This includes on-line records. These are captured, preserved, and kept accessible by the UKGWA, in conjunction with an external service provider. Initially, and until July 2017, this was the Internet Memory Foundation. The current provider is MirrorWeb.
This page is a timeline of digital preservation and Web archiving. It covers various aspects of saving and preserving digital data, whether they are born-digital or not.