UK Web Archive

Last updated

The UK Web Archive is a consortium of the six UK legal deposit libraries which aims to collect all UK websites at least once each year. [1] As of January 2025, its website is unavailable because of a cyberattack on the British Library in October 2023.

Contents

UK Web Archive
Ukwa-2018-onwhite-close.svg
Established2005
Reference to legal mandateYes, provided in law by:
Other information
Website www.webarchive.org.uk OOjs UI icon edit-ltr-progressive.svg
United Kingdom relief location map.jpg
Blue pog.svg
Blue pog.svg
Red pog.svg
Red pog.svg
Red steel pog.png
Red steel pog.png
Light-blue pog.svg
Dark-blue pog.svg
Bright-blue pog.svg
Libraries providing access to the archive.

History

In 2005, the British Library, The National Archives, Wellcome Trust, National Library of Scotland, National Library of Wales and JISC formed the UK Web Archiving Consortium, a project to archive websites. [3]

UKWAC archived selected websites by license or permission, using PANDAS software developed by the National Library of Australia. During the project its members collected sites relevant to their interest; the Wellcome Library collected medical sites, the national libraries sites that reflect life in contemporary Wales or Scotland. The British Library worked with a broad policy of collecting sites of cultural, historical and political importance to the UK. [4]

The Consortium wound up in 2010. The Archiving and Preservation Working Group took over UKWAC's co-ordinating role web archiving in the UK. The Digital Preservation Coalition hosts the working group. [5]

Web Archiving

The archive undertakes an annual crawl of .uk and other UK geographic Top Level Domains such as .scot, .cymru or .london.

A graph showing a small part of a crawl. Every circle is a different website, and every line represents a link that was followed between websites. The size of the circle represents how many pages were visited from that site, and the width of the line represents the number of links followed. UKWA Crawls: one hour in one minute Crawl graph.png
A graph showing a small part of a crawl. Every circle is a different website, and every line represents a link that was followed between websites. The size of the circle represents how many pages were visited from that site, and the width of the line represents the number of links followed. UKWA Crawls: one hour in one minute

The crawl is archived in a shared infrastructure called the Digital Library System. Members of the public can nominate sites for preservation there through the UKWA website. The whole web archive is available to registered readers on library premises; and where permission has been given, or license conditions can be met, copies are also accessible through the website. [6]

The archive gathers sites in response to events, building collections - these have preserved writing and imagery recording natural disasters, election campaigns since 2005 and the UK's blogosphere for research, among more than a hundred more. [7]

SHINE

SHINE graph showing how often different phrases for "year 2000 problem" appear between the years of 1996 and 2013 on archived .uk webpages. MillbugvY2KCDCFDL.png
SHINE graph showing how often different phrases for "year 2000 problem" appear between the years of 1996 and 2013 on archived .uk webpages.

The UK Web Archive holds a collection of all the .uk websites that were archived by the Internet Archive until the end of March in 2013. [8] SHINE is a web interface which can be used to create repeatable lists of results of historical .uk pages. Trends, or occurrences of keywords in the data set on .uk pages over that time, use concordance to show keywords in context. [9]

Mementos

Memento is a name for prior versions of web pages coined by the Memento Project. The UK Web Archive Memento interface allows Mementos to be found across web archives. [10] The interface can be used to find a Memento by its date in a snapshot table, or see how often a site appears across public web archives.

Researching the archive

Research into the web as a reflection of society has helped develop access to the archive. [11] Libraries have developed guides to research skills needed to use web archives. These include using big data to see patterns or trends, [12] or writing citations for archived copies of websites. [13]

GLAM Workbench

GLAM Workbench is a project which looks at how researchers can use data preserved by galleries, libraries, archives and museums. [14] It includes a collection of Jupyter notebooks which draw on Mementos and index data. [15] The notebooks mix description and editable code to help researchers find evidence in web archives.

Where the whole archive can be accessed, by Library
Bodleian Libraries British Library Cambridge University Libraries National Library of Scotland National Library of Wales Trinity College Dublin

See also

Related Research Articles

<span class="mw-page-title-main">JANET</span> Academic computer network in the United Kingdom

Janet is a high-speed network for the UK research and education community provided by Jisc, a not-for-profit company set up to provide computing support for education. It serves 18 million users and is the busiest National Research and Education Network in Europe by volume of data carried. Previously, Janet was a private, UK-government funded organisation, which provided the JANET computer network and related collaborative services to UK research and education.

<span class="mw-page-title-main">Internet Archive</span> American nonprofit digital archive

The Internet Archive is an American non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including websites, software applications, music, audiovisual, and print materials. The Archive also advocates a free and open Internet. Its mission is committing to provide "universal access to all knowledge".

In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

The Digital Preservation Coalition (DPC) is a UK-based charity that works with global partners to 'a welcoming and inclusive global community, working together to bring about a sustainable future for our digital assets'.

Web archiving is the process of collecting, preserving and providing access to material from the World Wide Web. The aim is to ensure that information is preserved in an archival format for research and the public.

The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the United Kingdom.

<span class="mw-page-title-main">UK Data Archive</span>

The UK Data Archive is a national centre of expertise in data archiving in the United Kingdom. It houses the largest collection of social sciences and population digital data in the UK. It is certified under CoreTrustSeal as a trusted digital repository. It is also certified under the international ISO 27001 standard for information security. Located in Colchester, the UK Data Archive is a specialist department of the University of Essex, co-located with the Institute for Social and Economic Research (ISER). It is primarily funded by the Economic and Social Research Council (ESRC) and the University of Essex.

The Digital Preservation Award is an international award sponsored by the Digital Preservation Coalition. The award 'recognises the many new initiatives being undertaken in the challenging field of digital preservation'. It was inaugurated in 2004 and was initially presented as part of the Institute of ConservationConservation Awards. Since 2012 the prize, which includes a trophy and a cheque, is presented independently. Awards ceremonies have taken place at the British Library, the British Museum and the Wellcome Trust.

SHERPA is an organisation originally set up in 2002 to run and manage the SHERPA Project.

Rhizome is an American not-for-profit arts organization that supports and provides a platform for new media art.

The Archaeology Data Service (ADS) is an open access digital archive for archaeological research outputs. It is located in The King's Manor, at the University of York. Originally intended to curate digital outputs from archaeological researchers based in the UK's Higher Education sector, the ADS also holds archive material created under the auspices of national and local government as well as in the commercial archaeology sector. The ADS carries out research, most of which focuses on resource discovery, cross-searching and interoperability with other relevant archives in the UK, Europe and the United States of America.

<span class="mw-page-title-main">International Internet Preservation Consortium</span> Organisation

The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future. It was founded in July 2003 by 12 participating institutions, and had grown to 35 members by January 2010. As of January 2022, there are 52 members.

<span class="mw-page-title-main">Trove</span> Australian online library database aggregator

Trove is an Australian online library database owned by the National Library of Australia in which it holds partnerships with source providers National and State Libraries Australia, an aggregator and service which includes full text documents, digital images, bibliographic and holdings data of items which are not available digitally, and a free faceted-search engine as a discovery tool.

A library consortium is any cooperative association of libraries that coordinates resources and/or activities on behalf of its members, whether they are academic, public, school or special libraries, and/or information centers. Library consortia have been created to service specific regions or geographic areas, e.g., local, state, regional, national or international. Many libraries commonly belong to multiple consortia. The goal of a library consortium is to amplify the capabilities and effectiveness of its member libraries through collective action, including, but not limited to, print or electronic resource sharing, reducing costs through group purchases of resources, and hosting professional development opportunities. The “bedrock principle upon which consortia operate is that libraries can accomplish more together than alone.”

<span class="mw-page-title-main">Internet Memory Foundation</span> Web archiving organisation

The Internet Memory Foundation was a non-profit foundation whose purpose was archiving content of the World Wide Web. It hosted projects and research that included the preservation and protection of digital media content in various forms to form a digital library of cultural content. As of August 2018, it is defunct.

The UK Government Web Archive (UKGWA) is part of The National Archives of the United Kingdom. The National Archives collects records from all UK government departments and bodies creating records defined as Public Records under the British Public Records Act. This includes on-line records. These are captured, preserved, and kept accessible by the UKGWA, in conjunction with an external service provider. Initially, and until July 2017, this was the Internet Memory Foundation. The current provider is MirrorWeb.

This page is a timeline of digital preservation and Web archiving. It covers various aspects of saving and preserving digital data, whether they are born-digital or not.

References

  1. "UKWA Home". www.webarchive.org.uk. Retrieved 2020-10-13.
  2. "The Legal Deposit Libraries (Non-Print Works) Regulations 2013". legislation.gov.uk. Retrieved February 21, 2022.
  3. "15 Years of the UK Web Archive - The Early Years - UK Web Archive blog". blogs.bl.uk. Archived from the original on 8 March 2020. Retrieved 2020-10-13.
  4. "UK Web Archiving Consortium: Evaluation Report". Digital Preservation Coalition. April 2006. Archived from the original on 9 January 2017. Retrieved 17 March 2014.
  5. "Web Archiving & Preservation Working Group - Digital Preservation Coalition". www.dpconline.org. Archived from the original on 31 July 2020. Retrieved 2020-10-13.
  6. "What is the UK Web Archive?". UK Web Archive. Archived from the original on 5 December 2019. Retrieved 17 March 2014.
  7. "15 Years of UKWA - Looking back at our first collections - UK Web Archive blog". blogs.bl.uk. Archived from the original on 29 July 2020. Retrieved 2020-10-19.
  8. www.webarchive.org.uk. "JISC UK Web Domain Dataset (1996-2013)". data.webarchive.org.uk. Retrieved 2020-10-16.
  9. "Trend results 1996-2013 for "big data" :: SHINE". www.webarchive.org.uk. Retrieved 2020-10-13.
  10. "Mementos - Archived history of www.webarchive.org.uk". Mementos - Finding historical archives across the world wide web. Retrieved 2020-10-09.
  11. Blaney, Jonathan (19 April 2016). "More project case studies available". Big UK Domain Data for the Arts and Humanities. Archived from the original on 16 February 2017. Retrieved 2020-10-09.
  12. McNally, Anna. "LibGuides: Finding and Using Digital Archives during COVID-19: Web archives". libguides.westminster.ac.uk. Retrieved 2020-10-14.
  13. Thomas, Susan. "Oxford LibGuides: Web Archives: Home". ox.libguides.com. Retrieved 2020-10-14.
  14. "Welcome to the GLAM Workbench - GLAM Workbench". glam-workbench.github.io. Retrieved 2020-10-13.
  15. Sherratt, Tim; Jackson, Andrew (2020-06-15). "GLAM-Workbench/web-archives". Zenodo. Bibcode:2020zndo...3894079S. doi:10.5281/zenodo.3894079.
  16. Team, National Records of Scotland Web (2013-05-31). "NRS Web Continuity Service". National Records of Scotland. Archived from the original on 18 January 2020. Retrieved 2020-10-13.
  17. "Search the PRONI Web Archive". nidirect. 2015-12-09. Archived from the original on 27 Aug 2020. Retrieved 2020-10-13.
  18. "MirrorWeb - UK Parliament Web Archive". webarchive.parliament.uk. Retrieved 2020-10-13.