Internet Memory Foundation

Last updated
Internet Memory Foundation
Company typeNon-profit foundation
IndustryWeb archiving and preservation
Founded
  • 2004 as European Archive
  • 2010 as Internet Memory
DefunctAugust 2018 (2018-08)
Headquarters,
The Netherlands
Website internetmemory.org/en/

The Internet Memory Foundation (formerly the European Archive Foundation) was a non-profitable foundation whose purpose was archiving content of the World Wide Web. It supported projects and research that included the preservation and protection of digital media content in various forms to form a digital library of cultural content. As of August 2018, it is defunct.

Contents

History

The non-profit institution European Archive Foundation was incorporated in 2004 in Amsterdam. [1] An announcement at the opening of the Cross Media Week in Amsterdam during September 2006 included a quote from Brewster Kahle, who founded the Internet Archive. [2] Julien Masanès was its first director. [3] Operating from Amsterdam and Paris, it said it would make freely accessible public domain collections and web archives. Masanès, previously at the Bibliothèque nationale de France, edited a book on Web archiving in 2007. [4] The Paris organization is called Internet Memory Research, which operates a service known as ArchiveTheNet. [5]

In December 2010, the Foundation changed its name to Internet Memory Foundation to express its goal of preserving internet content for current and future generations. [6]

The foundation had many partners, including cultural institutions and research institutions who collaborated on its web archiving projects. These partners included UK National Archives, [7] the Max Planck Institute, Technische Universität Berlin, University of Southampton, and the Institut Mines-Télécom. The foundation was also a member of the International Internet Preservation Consortium. [8]

Research

The foundation was involved in research projects to improve technologies of web crawling, data extraction, text mining, and preservation to support the growth and use of web archives. Their projects were funded by the European Commission through the Seventh Research Framework Program.

Collections

Audio and video

Before focusing on web archiving, the European Archive Foundation had collected one of the largest online free classical music collections (more than 800 pieces, from Mozart to Dvorak) and Public Information Films from the British Government, made in collaboration with the Netherlands Institute for Sound and Vision and the UK National Archives.

Selective web collection

The foundation archived a snapshot of the EU Institutions websites, made in collaboration with the Historical Archives of the European Union located in Italy, an archive of political websites of the 25 EU member states, [18] captured during the European constitutional debate, and archives (among others):

The Web crawler used by the project was Heritrix version 3. Heritrix generates resources stored in a standardised archiving "container" format, the ARC file (.arc). The ARC file was extended to the Web ARChive file format (.warc), which was approved as an international standard in June 2009 (current edition ISO 28500:2017). [20]

See also

Related Research Articles

<span class="mw-page-title-main">Internet Archive</span> American nonprofit digital archive

The Internet Archive is an American nonprofit digital library founded on May 10, 1996, and chaired by free information advocate Brewster Kahle. It provides free access to collections of digitized materials including websites, software applications, music, audiovisual and print materials. The Archive also advocates for a free and open Internet. As of February 4, 2024, the Internet Archive holds more than 44 million print materials, 10.6 million videos, 1 million software programs, 15 million audio files, 4.8 million images, 255,000 concerts, and over 835 billion web pages in its Wayback Machine. Its mission is committing to provide "universal access to all knowledge".

<span class="mw-page-title-main">OCLC</span> Global library cooperative (1967–)

OCLC, Inc., doing business as OCLC, is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It was founded in 1967 as the Ohio College Library Center, then became the Online Computer Library Center as it expanded. In 2017, the name was formally changed to OCLC, Inc. OCLC and thousands of its member libraries cooperatively produce and maintain WorldCat, the largest online public access catalog in the world. OCLC is funded mainly by the fees that libraries pay for the many different services it offers. OCLC also maintains the Dewey Decimal Classification system.

In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

<span class="mw-page-title-main">Heritrix</span> Web crawler designed for web archiving

Heritrix is a web crawler designed for web archiving. It was written by the Internet Archive. It is available under a free software license and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls.

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

<span class="mw-page-title-main">Publications Office of the European Union</span> Academic and legal publisher

The Publications Office of the European Union is the official provider of publishing services and data, information and knowledge management services to all EU institutions, bodies and agencies. This makes it the central point of access to EU law, publications, open data, research results, procurement notices, and other official information.

<span class="mw-page-title-main">BRICKS (software)</span>

Building Resources for Integrated Cultural Knowledge Services (BRICKS) is an open-source software framework for the management of distributed digital assets. BRICKS was deployed on cultural institutions under the umbrella of the BRICKS Cultural Heritage Network, a community of cultural heritage, scientific and industrial organizations across Europe. The software itself is shared under the GNU Lesser General Public License (LGPL).

<span class="mw-page-title-main">Future Internet Research and Experimentation</span>

Future Internet Research and Experimentation (FIRE) is a program funded by the European Union to do research on the Internet, its prospects, and its future, a field known as "future Internet".

<span class="mw-page-title-main">Biodiversity Heritage Library</span> Discipline-oriented digital libraries

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to address this challenge by digitizing the natural history literature held in their collections and making it freely available for open access as part of a global "biodiversity community". The BHL consortium works with the international taxonomic community, publishers, bioinformaticians, and information technology professionals to develop tools and services to facilitate greater access, interoperability, and reuse of content and data. BHL provides a range of services, data exports, and APIs to allow users to download content, harvest source data files, and reuse materials for research purposes. Through taxonomic intelligence tools developed by Global Names Architecture, BHL indexes the taxonomic names throughout the collection, allowing researchers to locate publications about specific taxa. In partnership with the Internet Archive and through local digitization efforts, BHL's portal provides free access to hundreds of thousands of volumes, comprising over 59 million pages, from the 15th-21st centuries.

Rhizome is an American not-for-profit arts organization that supports and provides a platform for new media art.

A digital library, also called an online library, an internet library, a digital repository, a library without walls, or a digital collection, is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.

<span class="mw-page-title-main">PADICAT</span> Web archive

PADICAT acronym for Patrimoni Digital de Catalunya, in Catalan; or Digital Heritage of Catalonia, in English, is the Web Archive of Catalonia.

<span class="mw-page-title-main">German National Library of Economics</span> Research library of economics

The National Library of Economics is the world's largest research infrastructure for economic literature, online as well as offline. The ZBW is a member of the Leibniz Association and has been a foundation under public law since 2007. Several times the ZBW received the international LIBER Award for its innovative work in librarianship. The ZBW allows for access of millions of documents and research on economics, partnering with over 40 research institutions to create a connective Open Access portal and social web of research. Through its EconStor and EconBiz, researchers and students have accessed millions of datasets and thousands of articles. The ZBW also edits two journals: Wirtschaftsdienst and Intereconomics.

The Archaeology Data Service (ADS) is an open access digital archive for archaeological research outputs. It is located in The King's Manor, at the University of York. Originally intended to curate digital outputs from archaeological researchers based in the UK's Higher Education sector, the ADS also holds archive material created under the auspices of national and local government as well as in the commercial archaeology sector. The ADS carries out research, most of which focuses on resource discovery, cross-searching and interoperability with other relevant archives in the UK, Europe and the United States of America.

The WARC archive format specifies a method for combining multiple digital resources into an aggregate archive file together with related information. The WARC format is a revision of the Internet Archive's ARC_IA File Format that has traditionally been used to store "web crawls" as sequences of content blocks harvested from the World Wide Web. The WARC format generalizes the older format to better support the harvesting, access, and exchange needs of archiving organizations. Besides the primary content currently recorded, the revision accommodates related secondary content, such as assigned metadata, abbreviated duplicate detection events, and later-date transformations. The WARC format is inspired by HTTP/1.0 streams, with a similar header and the use of CRLFs as delimiters, making it very conducive to crawler implementations.

The LINGUIST List is an online resource for the academic field of linguistics. It was founded by Anthony Aristar in early 1990 at the University of Western Australia, and is used as a reference by the National Science Foundation in the United States. Its main and oldest feature is the premoderated electronic mailing list, with subscribers all over the world.

The Council on Library and Information Resources (CLIR) is an American independent, nonprofit organization. It works with libraries, cultural institutions, and higher learning communities on developing strategies to improve research, teaching, and learning environments. It is based in Alexandria, VA, United States. CLIR is supported primarily by annual dues from its over 180 sponsoring institutions and 190 DLF members, and by foundation grants and individual donations.

This page is a timeline of digital preservation and Web archiving. It covers various aspects of saving and preserving digital data, whether they are born-digital or not.

References

  1. Mia Consalvo; Charles Ess, eds. (2011). The Handbook of Internet Studies. John Wiley & Sons. p. 31. ISBN   9781444342383.
  2. Masanès, Julian (28 September 2006). "Official Launch of the European Archive Foundation" (Press release).
  3. "Official Launch of the European Archive Foundation". Press release. 28 September 2006. Retrieved 7 October 2013.
  4. Julien Masanès, ed. (2007). Web Archiving. Springer. ISBN   9783540463320.
  5. "À propos: Internet Memory". Web site for ArchiveThe.net. Retrieved 7 October 2013. (in French)
  6. "Internet Memory Foundation". International Internet Preservation Consortium. Archived from the original on 9 April 2014. Retrieved 8 April 2014.
  7. Diana Maynard; Mark A. Greenwood (16 May 2012). "Large Scale Semantic Annotation, Indexing, and Search at The National Archives" (PDF). International Conference on Language Resources and Evaluation.
  8. Members (International Internet Preservation Consortium)
  9. "Scalable Preservation Environments". Community Research and Development Information Service web site. European Union. Retrieved 7 October 2013.
  10. "Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams". Community Research and Development Information Service. European Union. Retrieved 25 April 2016.
  11. "ARchive COmmunities MEMories". Community Research and Development Information Service web site. European Union. Retrieved 7 October 2013.
  12. "Web Archiving in Europe: A survey provided by the Internet Memory Foundation, 2010" (PDF). 22 March 2011. Retrieved 8 April 2014.
  13. "Longitudinal Analytics of Web Archive data". Community Research and Development Information Service web site. European Union. Retrieved 7 October 2013.
  14. "LivingKnowledge Facts, Opinions and Bias in Time". Community Research and Development Information Service web site. European Union. Retrieved 7 October 2013.
  15. "Living Web Archives". Community Research and Development Information Service web site. European Union. Retrieved 7 October 2013.
  16. "Report on "Technologies for Living Web archives"" (PDF). Deliverable report. 10 February 2011. Retrieved 7 October 2013.
  17. Dimitar Denev; Arturas Mazeika; Marc Spaniol; Gerhard Weikum (April 2011). "The SHARC framework for data quality in Web archiving". The International Journal on Very Large Data Bases. Springer-Verlag. 20 (2): 183–207. doi:10.1007/s00778-011-0219-9. S2CID   18258396.
  18. "The Historical Archives pilots archiving of EU Institutions websites". EUI Historical Archives of the European Union. Retrieved 18 August 2021.
  19. Adrian Brown (2006). Archiving websites: a practical guide for information management professionals . Facet Publishing. pp.  17–18. ISBN   9781856045537.
  20. "Iso 28500:2017".
  • Living Knowledge
  • LAWA, Longitudinal Analytics of Web Archive Data
  • ARCOMEM, European Archives, Museums and Libraries in the Age of the Social Web
  • SCAPE, Scalable Preservation Environments
  • LiWA, Living Web Archives