Million Book Project

Last updated

The Million Book Project (or the Universal Library) was a book digitization project led by Carnegie Mellon University School of Computer Science and University Libraries [1] from 2007 to 2008. Working with government and research partners in India (Digital Library of India) and China, the project scanned books in many languages, using OCR to enable full text searching, and providing free-to-read access to the books on the web. As of 2007, they have completed the scanning of 1 million books and have made the entire catalog accessible online.

Contents

Description

The Million Book Project was a 501(c)(3) charity organization with various scanning centers throughout the world.

By December 2007, more than 1.5 million books had been scanned, in 20 languages: 970,000 in Chinese; 360,000 in English; 50,000 in Telugu; and 40,000 in Arabic. [2] Most of the books are in the public domain, but permission has been acquired to include over 60,000 copyrighted books (roughly 53,000 in English and 7,000 in Indian languages). The books are mirrored in part at sites in India, China, Carnegie Mellon, the Internet Archive, Bibliotheca Alexandrina. The books that have been scanned to date are not yet all available online, and no single site has copies of all the books that are available online.

The million book project was a "proof of concept" that has largely been replaced by HathiTrust, Google Book Search and the Internet Archive book scanning projects.

The Internet Archive may have some books that Google does not (e.g.: The Poems of Robert Frost published after the end of 1922). [3] [4] [5]

The National Science Foundation (NSF) awarded Carnegie Mellon $3.63M over four years for equipment and administrative travel for the Million Book Project. India provided $25M annually to support language translation research projects. The Ministry of Education in China provided $8.46M over three years. The Internet Archive provided equipment, staff and money. The University of California, Merced Library funded the work to acquire copyright permission from U.S. publishers.

The program ended in 2008. [6]

Partner institutions

China

The institutions in China which are participants in this project include: [1]

India

The institutions in India which are participants in this project include: [1]

United States

The institutions in the U.S. which are participants include: [1]

Europe

The institutions in the EU which are participants include: [1]

See also

Related Research Articles

<span class="mw-page-title-main">Carnegie Mellon University</span> Private research university in Pittsburgh, Pennsylvania, U.S.

Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. The institution was originally established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institute of Technology and began granting four-year degrees. In 1967, it became the current-day Carnegie Mellon University through its merger with the Mellon Institute of Industrial Research, founded in 1913 by Andrew Mellon and Richard B. Mellon and formerly a part of the University of Pittsburgh.

<span class="mw-page-title-main">Internet Archive</span> American non-profit digital archive

The Internet Archive is an American digital library founded on May 10, 1996, and chaired by free information advocate Brewster Kahle. It provides free access to collections of digitized materials including websites, software applications, music, audiovisual and print materials. The Archive is also an activist organization, advocating a free and open Internet. As of January 1, 2023, the Internet Archive holds more than 39 million print materials, 11.6 million pieces of audiovisual content, 2.6 million software programs, 15 million audio files, 4.7 million images, 251,000 concerts and over 828 billion web pages in its Wayback Machine. Their mission is to provide "universal access to all knowledge."

<span class="mw-page-title-main">Raj Reddy</span> Indian-American computer scientist (born 1937)

Dabbala Rajagopal "Raj" Reddy is an Indian-born American computer scientist and a winner of the Turing Award. He is one of the early pioneers of artificial intelligence and has served on the faculty of Stanford and Carnegie Mellon for over 50 years. He was the founding director of the Robotics Institute at Carnegie Mellon University. He was instrumental in helping to create Rajiv Gandhi University of Knowledge Technologies in India, to cater to the educational needs of the low-income, gifted, rural youth. He is the chairman of International Institute of Information Technology, Hyderabad. He is the first person of Asian origin to receive the Turing Award, in 1994, known as the Nobel Prize of Computer Science, for his work in the field of artificial intelligence.

<span class="mw-page-title-main">John Mark Ockerbloom</span> Digital library architect and planner

John Mark Ockerbloom is a digital library architect and planner in the library science field. Formerly at Carnegie Mellon University, from which he earned a PhD in computer science, he now works for the University of Pennsylvania. He is the editor of The Online Books Page, which lists over two million books including project Gutenberg titles, all of which are freely available for reading online or by download.

The Open Content Alliance (OCA) was a consortium of organizations contributing to a permanent, publicly accessible archive of digitized texts. Its creation was announced in October 2005 by Yahoo!, the Internet Archive, the University of California, the University of Toronto and others. Scanning for the Open Content Alliance was administered by the Internet Archive, which also provided permanent storage and access through its website.

<span class="mw-page-title-main">Open Library</span> Online project for book data of the Internet Archive

Open Library is an online project intended to create "one web page for every book ever published". Created by Aaron Swartz, Brewster Kahle, Alexis Rossi, Anand Chitipothu, and Rebecca Malamud, Open Library is a project of the Internet Archive, a nonprofit organization. It has been funded in part by grants from the California State Library and the Kahle/Austin Foundation. Open Library provides online digital copies in multiple formats, created from images of many public domain, out-of-print, and in-print books.

<span class="mw-page-title-main">Google Books</span> Service from Google

Google Books is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project. Additionally, Google has partnered with a number of magazine publishers to digitize their archives.

Francisco D'Souza is an Indian-American entrepreneur and businessman, who is the former CEO and Vice Chairman of Cognizant — a Fortune 200 global professional services company – co-founded the NASDAQ-100 company in 1994. He succeeded Lakshmi Narayanan as the CEO in 2007 and in 2018 was appointed Vice Chairman, while continuing his role as the CEO till 1 April 2019.

Mary Ruefle is an American poet, essayist, and professor. She has published many collections of poetry, the most recent of which, Dunce, was longlisted for the National Book Award in Poetry and was a finalist for the 2020 Pulitzer Prize. Ruefle's debut collection of prose, The Most Of It, appeared in 2008 and her collected lectures, Madness, Rack, and Honey, was published in August 2012, both published by Wave Books. She has also published a book of erasures, A Little White Shadow (2006).

A universal library is a library with universal collections. This may be expressed in terms of it containing all existing information, useful information, all books, all works or even all possible works. This ideal, although unrealizable, has influenced and continues to influence librarians and others and be a goal which is aspired to. Universal libraries are often assumed to have a complete set of useful features.

<span class="mw-page-title-main">Project MUSE</span> Online database of journals and ebooks

Project MUSE, a non-profit collaboration between libraries and publishers, is an online database of peer-reviewed academic journals and electronic books. Project MUSE contains digital humanities and social science content from over 250 university presses and scholarly societies around the world. It is an aggregator of digital versions of academic journals, all of which are free of digital rights management (DRM). It operates as a third-party acquisition service like EBSCO, JSTOR, OverDrive, and ProQuest.

<span class="mw-page-title-main">Book scanning</span> Process of converting physical media into digital media

Book scanning or book digitization is the process of converting physical books and magazines into digital media such as images, electronic text, or electronic books (e-books) by using an image scanner. Large scale book scanning projects have made many books available online.

A digital library, also called an online library, an internet library, a digital repository, a library without walls, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.

<span class="mw-page-title-main">Global Memory Net</span>

Global Memory Net (GMNet) is a world digital library of cultural, historical, and heritage image collections. It is directed by Ching-chih Chen, Professor Emeritus of Simmons College, Boston, Massachusetts and supported by the National Science Foundation (NSF)'s International Digital Library Program (IDLP). The goal of GMNet is to provide a global collaborative network that provides universal access to educational resources to a worldwide audience. GMNet provides multilingual and multimedia content and retrieval, as well as links directly to major resources, such as OCLC, Internet Archive, Million Book Project, and Google.

<span class="mw-page-title-main">Zhejiang University Library</span> Chinese library system

Zhejiang University Library is the libraries system of Zhejiang University, and one of the largest and oldest university libraries in China.

<span class="mw-page-title-main">State Central Library, Hyderabad</span>

The State Central Library Hyderabad, known as the State Central Library (SCL) earlier known as Asafia Library, is a public library in Hyderabad, Telangana. The building was constructed in 1891. It is one of the most imposing structures in the city and was granted heritage status in 1998 by INTACH, Hyderabad.

<span class="mw-page-title-main">Ching-chih Chen</span>

Ching-chih Chen is an educator, administrator, consultant, and speaker in the field of digital information management and technology. After her 10-year administrative experience, and 49-year teaching, research, consulting and speaking activities, she became professor emeritus of Simmons College in June 2010, and president of Global Connection and Collaboration, Inc., a non-profit tax-exempt 501(c)(3) organization.

International IT University or International university of information technologies - established in close collaboration with educational organization iCarnegie which represents American IT university Carnegie Mellon in 2009 by order of President of Kazakhstan. Formation of the qualified, international recognized IT specialists in Kazakhstan became the purpose of creation of a higher educational institution of a similar profile. International IT University provided with grants from the government of Kazakhstan and national infocommunication companies, which cover disciplines by Kazakhstan and the U.S. educational systems.

<span class="mw-page-title-main">Chip Walter</span> American writer

William J. (Chip) Walter Jr. is an author, journalist, National Geographic Fellow, educator, filmmaker and former CNN bureau chief. He has written five mainstream science books between 1991 and 2019. Walter was one of the original employees at CNN when it went on the air June 1, 1980 and later became its youngest bureau chief when he created CNN's first Southeast Bureau in 1981 before heading up the network's San Francisco Bureau in 1982. He has written and produced several PBS science documentaries, served as an adjunct professor at Carnegie Mellon University in three different departments, worked with UNICEF on the issue of childhood trauma, spoken at Harvard, Xerox PARC, Carnegie Mellon University and the Chautauqua Institution. One of his three original screenplays was produced and released under the title Sunset Grill in 1993 starring Peter Weller, Lori Singer and Stacy Keach. In 2015 his feature story for National Geographic Magazine explored the origins of human art and symbolic thinking.

Mass digitization is a term used to describe "large-scale digitization projects of varying scopes." Such projects include efforts to digitize physical books, on a mass scale, to make knowledge openly and publicly accessible and are made possible by selecting cultural objects, prepping them, scanning them, and constructing necessary digital infrastructures including digital libraries. These projects are often piloted by cultural institutions and private bodies, however, individuals may attempt to conduct a mass digitization effort as well. Mass digitization efforts occur quite often; millions of files are uploaded to large-scale public or private online archives every single day. This practice of taking the physical to the digital on a mass realm changes the way we interact with knowledge. The history of mass digitization can be traced as early as the mid-1800s with the advent of microfilm, and technical infrastructures such as the internet, data farms, and computer data storage make these efforts technologically possible. This seemingly simple process of digitization of physical knowledge, or even products, has vast implications that can be explored.

References

  1. 1 2 3 4 5 "ULIB [About Us]". Carnegie Mellon University. Archived from the original on 2012-01-08.
  2. "The Million Book Project - 1.5 million scanned!". London Business School Library. Archived from the original on 2008-06-14.
  3. "Universal Library : Free Books : Free Texts : Download & Streaming : Internet Archive" . Retrieved 2016-02-05.
  4. "The Poems Of Robert Frost" . Retrieved 2016-02-05.
  5. Frost, Robert (1949). "The Poems of Robert Frost - Google Books" . Retrieved 2016-02-05.
  6. "Universal Digital Library". UCSB Library. November 9, 2016.