PADICAT

Last updated
PADICAT
PADICAT BN.jpg
URL http://www.padicat.cat/

PADICAT acronym for Patrimoni Digital de Catalunya, in Catalan; or Digital Heritage of Catalonia, in English, is the Web Archive of Catalonia. [1]

Contents

Created in 2005 [2] by the Biblioteca de Catalunya, the public institution responsible for collecting, preserving and distributing the bibliographic heritage, and the digital heritage by extension. Has the technological collaboration of the Center for Scientific and Academic Services of Catalonia, (CESCA) for preserving and giving access to old versions of web pages published on the Internet. The Biblioteca de Catalunya, as the responsible of PADICAT, is member of the International Internet Preservation Consortium (IIPC). [3]

History

PADICAT website 2011 D. padicat.jpg
PADICAT website 2011

PADICAT was born in 2005 following the trend of other national libraries on web archives creation, and as an answer to the publication of the guidelines for the preservation of digital heritage [4] by the UNESCO. There are many web archives running. [5] The most famous began in 1996: the Swedish Kulturarw3; [6] the Australian Pandora, [7] and the most popular repository, Internet Archive. [8]

The analysis of these and other projects, made way to the planning of PADICAT project, following the common trend around the world of a hybrid model of functioning, complementing the regular capture of a whole geographical domain (.cat domain in this case), with selective actions, and expand these coverage to different social events that generate an intense activity in the network (electoral campaigns, for instance) or with thematic packages (museums of Catalonia, Catalan folk-rock on the web, etc.). PADICAT complements all this with users contributions through the recommended webs.

In June 2005, the Biblioteca de Catalunya started the preliminary phase, of planning, in which a projects analysis was performed about existing resources, agents involved in production of web pages of Catalonia and legal issues that determine practices that want to do.

Based on parameters defined by the Biblioteca de Catalunya, on July 21, 2006, began to collect automatically websites likely to be part of the digital heritage of Catalonia. On September 11, 2006, coinciding with celebration of National Day of Catalonia, PADICAT website was opened to the public, with about thirty web pages stored.

The 2006–08 period represents production phase, project plan pilot, PADICAT operation phase: systematic capture of web pages of Catalonia.

The 2009–2011 period, Biblioteca de Catalunya should be in an optimum position, whereby this system -a pioneer in Spain and a benchmark in Europe- operates at full capacity. Furthermore, have reached cooperation agreements with more than 450 institutions of all kinds and has warranted online open access to all collection. On September 11, 2011, coinciding again with the National Day of Catalonia and with the fifth anniversary of its website, PADICAT has opened a new website version to access all deposited contents.

In November 2012, PADICAT has preserved 58,122 webs, 249.609 crawls, 349 million files and 13 TB of disk space. All of them are freely available. [9]

Mission and functioning

Mission and objectives

The mission of PADICAT is to harvest, to process and to provide access to digital heritage of Catalonia born on the Internet. Its objectives are:

After its birth (2005-2006), growth (2007-2008) and consolidation (2009-2011) phases, since 2012 is wanted to systematize its capacity for growth, with the goal of incorporating 75.700 versions of about 32.000 web sites per year, from:

In addition, there are four permanent work areas:

Functioning

Software

PADICAT software workflow schema Software ca.png
PADICAT software workflow schema

PADICAT is a system based on the implementation of several software that allow web pages to be collected, stored, organized, preserved and permanently accessed. Later to analysis phase and software test was determined that be used Heritrix [12] software, applied in most capture of digital resources projects. This is a software charge to compile web pages as the user sees when surf the Internet and store it in compressed files with ARC or WARC extension. Then, Heritrix software is complemented by NutchWax, [13] or by combination with Hadoop [14] and Wayback, [15] doing an indexing process to compiled information that will permit use these index for localize collection resources from query interfaces: Wera, [16] that permits search from keywords through generated indexes by NutchWax; and Wayback, that lets consult by URL in generated indexes by Hadoop and same Wayback.

Has been used Web Curator Tool [17] software, developed by National Library of New Zealand and British Library, as a document management system that permits allocate metadata to a significant part of collection, in order to integrate, in future, funds of deposit to search in other catalogs, from the Biblioteca de Catalunya or other institutions. Nowadays, websites are being cataloged through CAT, [18] a software expressly developed by CESCA technicians for the project.

Hardware

PADICAT servers at CESCA Cesca maquinari.jpg
PADICAT servers at CESCA

With regard to hardware that maintains system, there are six nodes HP ProLiant DL360 G4p, charge to collection and indexation tasks of web pages. In charge of results searching and viewing in web interface there is Linux cluster high-availability, with balance features of requests loads and error tolerance if there is a technical disaster of nodes that integrate platform. NetApp FAS3170 cabin presents 19TB of disk capacity via NFS to these nodes.

Nodes are connected with fibre to a Storage Area Network (SAN) and is complemented with saving system of data backup robot.

Is expected to include the deposited contents in PADICAT to COFRE [19] (COnservem per al Futur Recursos Electrònics), a high security preservation system created for the Biblioteca de Catalunya

Related Research Articles

<span class="mw-page-title-main">Lluís Domènech i Montaner</span> Catalan architect (1850–1923)

Lluís Domènech i Montaner was a Catalan architect who was very much involved in and influential for the Catalan Modernisme català, the Art Nouveau/Jugendstil movement. He was also a Catalan politician.

<span class="mw-page-title-main">Archaeology Museum of Catalonia</span> Archaeology museum in Olèrdola , Ullastret

The Archaeology Museum of Catalonia is an archaeological museum with five venues that exposes the most important archaeological collection of Catalonia, focusing on prehistoric times and ancient history. The museum was originally founded in 1932 by the Republican Government of Catalonia. The modern institution was created under the Museums of Catalonia Act in 1990 by the Ministry of Culture of the same Government.

<span class="mw-page-title-main">Josep Carner</span> Spanish poet, journalist, playwright and translator

Josep Carner i Puigoriol, was a Spanish poet, journalist, playwright and translator. He was also known as the Prince of Catalan Poets. He was nominated for the Nobel Prize in Literature seven times.

<span class="mw-page-title-main">Heritrix</span> Web crawler designed for web archiving

Heritrix is a web crawler designed for web archiving. It was written by the Internet Archive. It is available under a free software license and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls.

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.

The Butaca Theater Awards of Catalonia are annual theater awards judged by popular vote existing since 1995 in Catalonia to reward those who have made valuable contributions to Catalan theater.

<span class="mw-page-title-main">Library of Catalonia</span> National library in Catalonia

The Library of Catalonia is the Catalan national library, located in Barcelona, Catalonia, Spain. The primary mission of the Library of Catalonia is to collect, preserve, and spread Catalan bibliographic production and that related to the Catalan linguistic area, to look after its conservation, and to spread its bibliographic heritage while maintaining the status of a center for research and consultation.

<span class="mw-page-title-main">International Internet Preservation Consortium</span> Organisation

The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future. It was founded in July 2003 by 12 participating institutions, and had grown to 35 members by January 2010. As of January 2022, there are 52 members.

<span class="mw-page-title-main">Altafulla</span> Municipality in Catalonia, Spain

Altafulla is a municipality in the comarca of the Tarragonès in Catalonia, Spain. The town of Altafulla has a beautifully intact old quarter crowned by the Castle of Altafulla, and an old fishing quarter dating back to the 18th century along the beach, called "Baixamar" or "Les Botigues de Mar".

<span class="mw-page-title-main">Pirate Party of Catalonia</span> Political party in Spain

Pirates of Catalonia is a political party in Catalonia. The party is based on the model of the Swedish Pirate Party and is a member of the Pirate Parties International, it supports intellectual property reform, open access to culture and knowledge, transparency and direct democracy.

<span class="mw-page-title-main">Girona railway station</span>

Girona is a railway station serving the city of Girona in Catalonia, Spain. It is located in the northern part of the municipality, lying at approximately 1 kilometre (0.62 mi) southwest of the city's historical centre.

<span class="mw-page-title-main">Old Hospital de la Santa Creu, Barcelona</span> Building complex in Barcelona, Spain

The Old Hospital de la Santa Creu is a 15th-to-18th-century building complex in Barcelona, which formerly served as a hospital and hospice and currently is the home of the National Library of Catalonia, the Institute for Catalan Studies, the former College of Surgeons, and an art school. It has been declared a Historic and Artistic Landmark of National Interest.

The Name and Title Authority File of Catalonia (CANTIC) is an authority union catalogue within the Union Catalogue of Universities of Catalonia (CCUC), that it is led by the Biblioteca de Catalunya. Its goals are to standardize the access points in bibliographic catalogues, to improve communication among catalogues and mainly, to make easier the information research and retrieval. CANTIC gives a special treatment to name and title authorities related with Catalan culture. These authorities receive a complete authority work and provide, eventually, access to the Enciclopèdia Catalana.

Hipertext.net is a biannual open access Peer review academic journal covering all aspects of information, documentation and archives in the digital world and Interactive Communication. It is published by the Information Science Section of the Communication Department of the Pompeu Fabra University and was established in 2003 by Cristòfol Rovira, Lluís Codina, and Mari-Carmen Marcos.

Oriol Sellarès Martínez is a Spanish track and field competitor on the regional level and a track and field sighted guide on the international level, representing Spain at the 2012 Summer Paralympics as the guide for Ricardo de Pedraza Losa.

<span class="mw-page-title-main">Dolors Lamarca</span> Catalan librarian and philologist

Dolors Lamarca y Morell is a Catalan librarian and philologist. She has led the Service of Libraries and Bibliographic Heritage of the Generalitat de Catalunya, and has directed the National Library of Catalonia. Widow of Antoni Comas i Pujol, with whom she had three daughters.

<span class="mw-page-title-main">The Museum of the Mediterranean</span> Ethnology museum in Can Quintana, Torroella de Montgrí

The Museum of the Mediterranean is a museum located in Torroella de Montgrí, founded in 2003, in an attempt to become an area for knowledge, reflection and research for the problems and worries that affect citizens in the 21st century. It is housed in Can Quintana, a 16th-century building. It is dedicated to the knowledge and communication of the Mediterranean Sea and uses the natural sounds, human sounds and music to show the reality of the land, the history and the culture of the village of Torroella, connected with the others Mediterranean cultures and towns. It pretends to become a site to think about the nearest territory and the main issues that affect the different Mediterranean nations.

<span class="mw-page-title-main">Archaeology Museum of Catalonia (Girona)</span> Archaeology museum in c/ Santa Llúcia, Girona

<span class="mw-page-title-main">Ethnographic Museum of Ripoll</span> Ethnographic Museum in Girona, Spain

The Ethnographic Museum of Ripoll is a museum in the city of Ripoll, province of Girona in Catalonia.

<span class="mw-page-title-main">Episcopal Public Library of Barcelona</span> Special and heritage library in Barcelona

The Episcopal Public Library of Barcelona, also called the Episcopal Public Library of the Seminary of Barcelona is a library located in the building of the Conciliar Seminary of Barcelona. Founded in 1772, it is the oldest preserved public access library in the city of Barcelona, Spain. It houses a large collection of old and modern books, with about 360,000 volumes.

References

  1. Official website
  2. Biblioteca de Catalunya (2005), Memòria del plantejament del projecte PADICAT (Patrimoni Digital de Catalunya), Barcelona: Biblioteca de Catalunya, retrieved 2012-11-22
  3. International Internet Preservation Consortium
  4. National Library of Australia (2003), Guidelines for the preservation of digital heritage (PDF), Canberra: UNESCO, retrieved 2012-11-22
  5. Llueca, Ciro (2005), Webs sempre accessibles : les biblioteques nacionals i els dipòsits digitals nacionals, BiD: textos universitaris de biblioteconomia i documentació, retrieved 2012-11-20
  6. Kulturarw3
  7. Pandora
  8. Internet Archive
  9. PADICAT
  10. Cooperation agreement between the Biblioteca de Catalunya and fundació puntCAT, for the preservation of web pages, has been signed
  11. Llueca, Ciro; Cócera, Daniel; Torres, Natàlia; et al. (2012), A ritmo de tweet: archivando elecciones 2.0 (PDF), El profesional de la información, retrieved 2012-11-21
  12. Heritrix
  13. NutcWax
  14. Hadoop
  15. Wayback
  16. Wera
  17. Web Curator Tool
  18. Llueca, Ciro; Cócera, Daniel; Torresa, Natàlia; et al. (2010), CAT (Curator Archiving Tool): improving access to web archives = CAT (Curator Archiving Tool): millorant l'accés als arxius web = CAT (Curator Archiving Tool): mejorando el acceso a los archivos web (PDF), retrieved 2012-11-21
  19. Serra, Eugènia; Pérez, Karibel; Llueca, Ciro (2012), "La Biblioteca de Catalunya i l'accés al patrimoni digital", Métodos de Informacion, MEI, 2 (2): 5–20, doi: 10.5557/IIMEI2-N2-005020 , retrieved 2012-11-21