Type of site | Digital library |
---|---|
Owner | University consortium |
Revenue | US$3,777,445 (2019 projections for proposal) [1] |
URL | hathitrust |
Commercial | Partially [2] |
Launched | October 2008 |
Current status | Active |
Content license | Public domain (with restrictions on Google scans), various [3] |
Written in | Perl, Java [2] |
HathiTrust Digital Library is a large-scale collaborative repository of digital content from research libraries including content digitized via Google Books and the Internet Archive digitization initiatives, as well as content digitized locally by libraries.
Hathi (IPA: [ˈhɑːti] ), derived from the Sanskrit hastin , is the Hindi word for 'elephant', an animal famed for its long-term memory. [4]
HathiTrust was founded in October 2008 by the twelve universities of the Committee on Institutional Cooperation and the eleven libraries of the University of California. [5] As of 2024, members include more than 219 research libraries [6] across the United States, Canada, and Europe, and is based on a shared governance structure. Costs are shared by the participating libraries and library consortia. [7] The repository is administered by the University of Michigan. [8] The executive director of HathiTrust is Mike Furlough, [9] who succeeded founding director John Wilkin after Wilkin stepped down in 2013. [10] The HathiTrust Shared Print Program is a distributed collective collection whose participating libraries have committed to retaining almost 18 million monograph volumes for 25 years, representing three-quarters of HathiTrust digital book holdings. [11]
In September 2011, the Authors Guild sued HathiTrust ( Authors Guild, Inc. v. HathiTrust ), alleging massive copyright violation. [12] A federal court ruled against the Authors Guild in October 2012, finding that HathiTrust's use of books scanned by Google was fair use under US law. [13] The court's opinion relied on the transformativeness doctrine of federal copyright law, holding that HathiTrust had transformed the copyrighted works without infringing on the copyright holders' rights. That decision was largely affirmed by the Second Circuit on June 10, 2014, which found that providing search and accessibility for the visually impaired were grounds to consider the service transformative and fair use, and remanded to the lower court to reconsider whether the plaintiffs had standing to sue regarding HathiTrust's library preservation copies. [14]
In September 2024, HathiTrust comprised more than 18 million volumes, including 6.7 million in the public domain in the United States. HathiTrust provides a number of discovery and access services, notably, full-text search across the entire repository.
As of 2021, the copyright policy states that "many works in our collection are protected by copyright law, so we cannot ordinarily publicly display large portions of those protected works unless we have permission from the copyright holder", and thus "if we cannot determine the copyright or permission status of a work, we restrict access to that work until we can establish its status. Because of differences in international copyright laws, access is also restricted for users outside the United States to works published outside the United States after and including 1896." [15]
PageTurner is the web application on the HathiTrust website for viewing publications. [16] From PageTurner readers can navigate through a publication, download a PDF version of it, and view pages in different ways, such as one page at a time, scrolling, flipping, or thumbnail views. [16] [17]
The Emergency Temporary Access Service [18] (ETAS) is a service provided by HathiTrust that makes it possible in certain special situations, such as closure of a library for a public health emergency, for users of HathiTrust member libraries to obtain lawful access to copyright digital materials in place of the corresponding physical books held by the same library.
Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to balance the interests of copyright holders with the public interest in the wider distribution and use of creative works by allowing as a defense to copyright infringement claims certain limited uses that might otherwise be considered infringement. The U.S. "fair use doctrine" is generally broader than the "fair dealing" rights known in most countries that inherited English Common Law. The fair use right is a general exception that applies to all different kinds of uses with all types of works. In the U.S., fair use right/exception is based on a flexible proportionality test that examines the purpose of the use, the amount used, and the impact on the market of the original work.
The Internet Archive is an American nonprofit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized materials including websites, software applications, music, audiovisual, and print materials. The Archive also advocates a free and open Internet. As of September 5, 2024, the Internet Archive held more than 42.1 million print materials, 13 million videos, 1.2 million software programs, 14 million audio files, 5 million images, 272,660 concerts, and over 866 billion web pages in its Wayback Machine. Its mission is committing to provide "universal access to all knowledge".
Electronic publishing includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. It also includes the editing of books, journals, and magazines to be posted on a screen.
Digitization is the process of converting information into a digital format. The result is the representation of an object, image, sound, document, or signal obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a digital image, for the object, and digital form, for the signal. In modern practice, the digitized data is in the form of binary numbers, which facilitates processing by digital computers and other operations, but digitizing simply means "the conversion of analog source material into a numerical format"; the decimal or any other number system can be used instead.
In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.
The University of Michigan Library is the academic library system of the University of Michigan. The university's 38 constituent and affiliated libraries together make it the second largest research library by number of volumes in the United States.
Google Books is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project. Additionally, Google has partnered with a number of magazine publishers to digitize their archives.
The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management and improved support for teaching and research. In collaboration with the ten University of California Libraries and other partners, CDL assembled one of the world's largest digital research libraries. CDL facilitates the licensing of online materials and develops shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the country and works in partnership with the UC campuses to bring the treasures of California's libraries, museums, and cultural heritage organizations to the world. CDL continues to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.
Book scanning or book digitization is the process of converting physical books and magazines into digital media such as images, electronic text, or electronic books (e-books) by using an image scanner. Large scale book scanning projects have made many books available online.
The Michigan Digitization Project is a project in partnership with Google Books to digitize the entire print collection of the University of Michigan Library. The digitized collection is available through the University of Michigan Library catalog, Mirlyn, the HathiTrust Digital Library, and Google Books. Full-text of works that are out of copyright or in the public domain are available.
A digital library is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.
Authors Guild v. Google 804 F.3d 202 was a copyright case heard in federal court for the Southern District of New York, and then the Second Circuit Court of Appeals between 2005 and 2015. It concerned fair use in copyright law and the transformation of printed copyrighted books into an online searchable database through scanning and digitization. It centered on the legality of the Google Book Search Library Partner project that had been launched in 2003.
McGill University Library is the library system of McGill University in Montréal, Québec, Canada. It comprises 13 branch libraries, located on the downtown Montreal and Macdonald campuses, holding over 11.78 million items. It is the fourth-largest research intensive academic library in Canada.
An orphan work is a work whose copyright owner is impossible to identify or contact. This inability to request permission from the copyright owner often means orphan works cannot be used in new works or digitized, except when fair use exceptions apply. Until recently, public libraries could not digitize orphaned books without risking being fined up to $150,000 if the owner of the copyright were to come forward. This problem was briefly addressed in the 2011 case Authors Guild, Inc. v. Google, but the settlement in that case was later overturned.
The Digital Public Library of America (DPLA) is a US project aimed at providing public access to digital holdings in order to create a large-scale public digital library. It officially launched on April 18, 2013, after two-and-a-half years of development.
Gloriana St. Clair is a pioneer in the field of academic librarianship, as well as a scholar of Norse Mythology and its relationship to the works of J.R.R. Tolkien. She is currently the Principal Investigator of the Olive Executable Archive, and was previously the official University Liaison to the Pittsburgh chapter of the Osher Lifelong Learning Institute at Carnegie Mellon University. She is Dean Emerita of Carnegie Mellon University Libraries (1998-2013). Before coming to Carnegie Mellon, St. Clair held leadership positions at several other universities. St. Clair attended the University of California, Berkeley, receiving a bachelor's degree in English in 1962 and a master's degree in library science in 1963.
A memory institution is an organization maintaining a repository of public knowledge, a generic term used about institutions such as libraries, archives, heritage institutions, aquaria and arboreta, and zoological and botanical gardens, as well as providers of digital libraries and data aggregation services which serve as memories for given societies or mankind. Memory institutions serve the purpose of documenting, contextualizing, preserving and indexing elements of human culture and collective memory. These institutions allow and enable society to better understand themselves, their past, and how the past impacts their future. These repositories are ultimately preservers of communities, languages, cultures, customs, tribes, and individuality. Memory institutions are repositories of knowledge, while also being actors of the transitions of knowledge and memory to the community. These institutions ultimately remain some form of collective memory. Increasingly such institutions are considered as a part of a unified documentation and information science perspective.
Authors Guild v. HathiTrust, 755 F.3d 87, is a United States copyright decision finding search and accessibility uses of digitized books to be fair use.
John Price Wilkin is an American librarian whose work has primarily been in development of digital library technologies and research library management. He was formerly the Juanita J. and Robert E. Simpson Dean of Libraries and University Librarian at the University of Illinois at Urbana-Champaign, and is currently the Chief Executive Officer of Lyrasis.
Controlled digital lending (CDL) is a model by which libraries digitize materials in their collection and make them available for lending. It is based on interpretations of the United States copyright principles of fair use and copyright exhaustion.