HathiTrust

Last updated
HathiTrust
HathiTrust logo.svg
Type of site
Digital library
OwnerUniversity consortium
Revenue US$3,777,445 (2019 projections for proposal) [1]
URL hathitrust.org
CommercialPartially [2]
LaunchedOctober 2008
Current statusActive
Content license
Public domain (with restrictions on Google scans), various [3]
Written in Perl, Java [2]

HathiTrust Digital Library is a large-scale collaborative repository of digital content from research libraries including content digitized via Google Books and the Internet Archive digitization initiatives, as well as content digitized locally by libraries.

Contents

History

HathiTrust was founded in October 2008 by the twelve universities of the Committee on Institutional Cooperation and the eleven libraries of the University of California. [4] The partnership includes over 60 research libraries [5] across the United States, Canada, and Europe, and is based on a shared governance structure. Costs are shared by the participating libraries and library consortia. [6] The repository is administered by the University of Michigan. [7] The executive director of HathiTrust is Mike Furlough. [8] The HathiTrust Shared Print Program is a distributed collective collection whose participating libraries have committed to retaining almost 18 million monograph volumes for 25 years, representing three-quarters of HathiTrust digital book holdings. [9]

In September 2011, the Authors Guild sued HathiTrust ( Authors Guild, Inc. v. HathiTrust ), alleging massive copyright violation. [10] A federal court ruled against the Authors Guild in October 2012, finding that HathiTrust's use of books scanned by Google was fair use under US law. [11] The court's opinion relied on the transformativeness doctrine of federal copyright law, holding that the Trust had transformed the copyrighted works without infringing on the copyright holders' rights. That decision was largely affirmed by the Second Circuit on June 10, 2014, which found that providing search and accessibility for the visually impaired were grounds to consider the service transformative and fair use, and remanded to the lower court to reconsider whether the plaintiffs had standing to sue regarding HathiTrust's library preservation copies. [12]

In October 2015, HathiTrust comprised over 13.7 million volumes, including 5.3 million in the public domain in the United States. HathiTrust provides a number of discovery and access services, notably, full-text search across the entire repository. In 2016 over 6.17 million users located in the United States and in 236 other nations used Hathitrust in 10.92 million sessions. [13]

As of 2021, the copyright policy states that "many works in our collection are protected by copyright law, so we cannot ordinarily publicly display large portions of those protected works unless we have permission from the copyright holder", and thus "if we cannot determine the copyright or permission status of a work, we restrict access to that work until we can establish its status. Because of differences in international copyright laws, access is also restricted for users outside the United States to works published outside the United States after and including 1896." [14]

PageTurner

PageTurner is the web application on the HathiTrust website for viewing publications. [15] From PageTurner readers can navigate through a publication, download a PDF version of it, and view pages in different ways, such as one page at a time, scrolling, flipping, or thumbnail views. [15] [16]

Emergency Temporary Access Service

The Emergency Temporary Access Service [17] (ETAS) is a service provided by HathiTrust that makes it possible in certain special situations, such as closure of a library for a public health emergency, for users of HathiTrust member libraries to obtain lawful access to copyright digital materials in place of the corresponding physical books held by the same library through the controlled digital lending model.

Etymology

Hathi, pronounced "hah-tee", is the Hindi word for "elephant", an animal famed for its long-term memory. [18]

Related Research Articles

<span class="mw-page-title-main">Internet Archive</span> American non-profit organization providing archives of digital media since 1996

The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, movies/videos, moving images, and millions of books. In addition to its archiving function, the Archive is an activist organization, advocating a free and open Internet. As of May 7, 2022, the Internet Archive holds over 35 million books and texts, 7.9 million movies, videos and TV shows, 842 thousand software programs, 14 million audio files, 4 million images, 2.4 million TV clips, 237 thousand concerts, and over 682 billion web pages in the Wayback Machine.

Electronic publishing includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. It also includes the editing of books, journals and magazines to be posted on a screen.

The Million Book Project was a book digitization project led by Carnegie Mellon University School of Computer Science and University Libraries from 2007–2008. Working with government and research partners in India and China, the project scanned books in many languages, using OCR to enable full text searching, and providing free-to-read access to the books on the web. As of 2007, they have completed the scanning of 1 million books and have made the entire catalog accessible online.

<span class="mw-page-title-main">Digitization</span> Converting information into digital form

Digitization is the process of converting information into a digital format. The result is the representation of an object, image, sound, document, or signal obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a digital image, for the object, and digital form, for the signal. In modern practice, the digitized data is in the form of binary numbers, which facilitates processing by digital computers and other operations, but digitizing simply means "the conversion of analog source material into a numerical format"; the decimal or any other number system can be used instead.

In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies, and it combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. The Association for Library Collections and Technical Services Preservation and Reformatting Section of the American Library Association, defined digital preservation as combination of "policies, strategies and actions that ensure access to digital content over time." According to the Harrod's Librarian Glossary, digital preservation is the method of keeping digital material alive so that they remain usable as technological advances render original hardware and software specification obsolete.

<span class="mw-page-title-main">Open Library</span> Online project for book data of the Internet Archive

Open Library is an online project intended to create "one web page for every book ever published". Created by Aaron Swartz, Brewster Kahle, Alexis Rossi, Anand Chitipothu, and Rebecca Malamud, Open Library is a project of the Internet Archive, a nonprofit organization. It has been funded in part by grants from the California State Library and the Kahle/Austin Foundation. Open Library provides online digital copies in multiple formats, created from images of many public domain, out-of-print, and in-print books.

<span class="mw-page-title-main">Google Books</span> Service from Google

Google Books is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project. Additionally, Google has partnered with a number of magazine publishers to digitize their archives.

The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management and improved support for teaching and research. In collaboration with the ten University of California Libraries and other partners, CDL assembled one of the world's largest digital research libraries. CDL facilitates the licensing of online materials and develops shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the country and works in partnership with the UC campuses to bring the treasures of California's libraries, museums, and cultural heritage organizations to the world. CDL continues to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.

Book scanning Process of converting physical media into digital media

Book scanning or book digitization is the process of converting physical books and magazines into digital media such as images, electronic text, or electronic books (e-books) by using an image scanner. Large scale book scanning projects have made many books available online.

The Michigan Digitization Project is a project in partnership with Google Books to digitize the entire print collection of the University of Michigan Library. The digitized collection is available through the University of Michigan Library catalog, Mirlyn, the HathiTrust Digital Library, and Google Books. Full-text of works that are out of copyright or in the public domain are available.

A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.

The Book Rights Registry is an entity to be founded as part of a settlement of the lawsuit between the Authors Guild and Google over the Google Books scanning project. The Registry will be initially funded by $34.5 million from Google but it will be an independent, not-for-profit organization that collects and disburses revenue from third party users of content to authors, publishers and other rightsholders. According to the Settlement Agreement, the Registry will own and maintain a rights information database for all books covered by the Agreement and their authors and publishers. It will also resolve disputes between rightsholders.

<i>Authors Guild, Inc. v. Google, Inc.</i> U.S. copyright law case, 2015

Authors Guild v. Google 721 F.3d 132 was a copyright case heard in the United States District Court for the Southern District of New York, and on appeal to the United States Court of Appeals for the Second Circuit between 2005 and 2015. The case concerned fair use in copyright law and the transformation of printed copyrighted books into an online searchable database through scanning and digitization. The case centered on the legality of the Google Book Search Library Partner project that had been launched in 2003.

Orphan works in the United States

An orphan work is a work whose copyright owner is impossible to identify or contact. This inability to request permission from the copyright owner often means orphan works cannot be used in new works nor digitized, except when fair use exceptions apply. Until recently, public libraries could not distribute orphaned books without risking being fined up to $150,000 if the owner of the copyright were to come forward. This problem was addressed in the 2011 case Authors Guild, Inc. v. Google.

A memory institution is an organization maintaining a repository of public knowledge, a generic term used about institutions such as libraries, archives, heritage institutions, aquaria and arboreta, and zoological and botanical gardens, as well as providers of digital libraries and data aggregation services which serve as memories for given societies or mankind. Memory institutions serve the purpose of documenting, contextualizing, preserving and indexing elements of human culture and collective memory. These institutions allow and enable society to better understand themselves, their past, and how the past impacts their future. These repositories are ultimately preservers of communities, languages, cultures, customs, tribes, and individuality. Memory institutions are repositories of knowledge, while also being actors of the transitions of knowledge and memory to the community. These institutions ultimately remain some form of collective memory. Increasingly such institutions are considered as a part of a unified documentation and information science perspective.

The Orygynale Cronykil of Scotland is a history of Scotland from the beginning of the world until the accession of King James I. Attributed to Andrew of Wyntoun, a learned scholar of the time, it is one of the only manuscripts composed in Scots verse before the seventeenth century, though it is also said to be written in northern English. Wyntoun himself calls his language "Ynglys".

<i>Authors Guild, Inc. v. HathiTrust</i> American legal case

Authors Guild v. HathiTrust, 755 F.3d 87, is a United States copyright decision finding search and accessibility uses of digitized books to be fair use.

James Henry Nixon British illustrator and painter

James Henry Nixon (1802-1857) was an illustrator and painter during the Victorian period, who worked in the firm Ward and Nixon painting stained glass windows. James Henry Nixon was a protégé of Charles Winston, who praised Nixon's work at Westminster Abbey and Church of Christ the King, Bloomsbury. The company Ward and Nixon was followed by Ward and Hughes.

<span class="mw-page-title-main">Controlled digital lending</span> Digital library lending model

Controlled digital lending (CDL) is a model by which libraries digitize materials in their collection and make them available for lending. It is based on interpretations of the United States copyright principles of fair use and copyright exhaustion.

Collective collection Form of collaboration between libraries

A collective collection, also known as a shared printprogram, involves mostly academic or research libraries collaborating to retain, develop, and provide access to their physical collections. Most collective collections comprise monographs and/or serials. Other efforts have addressed acquisition and/or retention of microform, federal government documents, and digital collections.

References

  1. "2018 Member Meeting" (PDF). hathitrust.org. October 2018. p. 56. Archived (PDF) from the original on 2019-01-18. Retrieved 2018-12-31. Slides in PDF.
  2. 1 2 "Technological Profile". hathitrust.org. Archived from the original on 16 June 2016. Retrieved 12 May 2016.
  3. "Access and Use Policies". hathitrust.org. Archived from the original on 16 June 2016. Retrieved 12 May 2016.
  4. Karels, Liene (November 2010). "HathiTrust adds new members, goes global". umich.edu. Archived from the original on 2014-03-02. Retrieved 2019-06-21.
  5. "HathiTrust Partnership Community". hathitrust.org. Archived from the original on 5 September 2015. Retrieved 28 August 2015.
  6. "Cost". hathitrust.org. Archived from the original on 2019-01-18. Retrieved 2019-06-21.
  7. "Governance". hathitrust.org. Archived from the original on 2019-06-21. Retrieved 2019-06-21.
  8. "HathiTrust Staff". hathitrust.org. Archived from the original on 2019-06-21. Retrieved 2019-06-21.
  9. "Shared Print Program | www.hathitrust.org | HathiTrust Digital Library". www.hathitrust.org. Archived from the original on 2019-11-27. Retrieved 2019-12-18.
  10. Bosman (September 12, 2011). "Lawsuit Seeks the Removal of a Digital Book Collection". The New York Times. Archived from the original on December 19, 2012. Retrieved November 1, 2012.
  11. Albanese, Andrew (11 October 2012). "Google Scanning Is Fair Use Says Judge". Publishers Weekly . Archived from the original on 15 October 2012. Retrieved 11 October 2012.
  12. Authors Guild v. HathiTrust, (2d Cir. June 10, 2014 Archived August 8, 2014, at the Wayback Machine ).
  13. Zaytsev, Angelina (February 2017). "14 Million Books & 6 Million Visitors: HathiTrust Growth and Usage in 2016" (PDF). hathitrust.org. Archived (PDF) from the original on 2017-02-24. Retrieved 2019-06-21.
  14. "Trust copyright policy - restrictions on access". Archived from the original on July 26, 2011.
  15. 1 2 Meltzer, Ellen (May 9, 2011). "Viewing HathiTrust books just got better". cdlib.org. California Digital Library. Archived from the original on June 21, 2019. Retrieved 2019-06-21.
  16. "HathiTrust User's Guide" (PDF). hathitrust.org. May 2012. p. 8. Archived (PDF) from the original on 2017-01-09. Retrieved 2019-06-21.
  17. "Emergency Temporary Access Service". HathiTrust Digital Library.
  18. "Launch of HathiTrust: Major Library Partners Launch HathiTrust Shared Digital Repository" (Press release). HathiTrust. October 13, 2008. Archived from the original on August 19, 2019. Retrieved 2019-06-21.

Further reading