The Index Thomisticus was a digital humanities project begun in the 1940s that created a concordance to 179 texts centering around Thomas Aquinas. Led by Roberto Busa, the project indexed 10,631,980 words over the course of 34 years, initially onto punched cards. It is considered a pioneering project in the field of digital humanities.
Busa began the project in 1946. [1] In 1949, IBM agreed to sponsor the project until its completion. [2] They assigned Paul Tasman, an executive at the company, to work with Busa. [3] Busa selected 179 texts centering around Thomas Aquinas that would be put into a form that was machine-readable. 118 of the works were written by Aquinas, and the remaining 61 items were either at one point mis-attributed to him or an attempt to complete an unfinished work begun by Aquinas. [2]
A significant part of the project was the data entry, which was meticulously carried out by a team of female keypunch operators. Their dedication and precision were instrumental in the success of the project. [4] This work of punching the text was made between 1950 and 1966. They worked in Gallarate, Italy, [5] [6] and the project peaked in size in 1962 with 70 workers. [7] After the punching was complete, the data was lemmatised in a semi-automatic process. [5]
The completed project indexed a total of 10,631,980 words in fifty-six volumes over 70,000 pages—divided into ten volumes of indexes, followed by thirty-one volumes of concordances of Aquinas's works, eight volumes of concordances of related authors, and seven volumes that reprinted the source texts. [2] [8] The seven completely reprinting the source texts were sold separately. [2] The first volume was published in 1974, [9] and publication was completed in 1980. The project used a total of 1,500 kilometres (930 mi) of tape [10] and it took an estimated 10,000 hours of computer work and 1 million hours of human work to complete. [3] The Index was released on CD-ROM in 1992 and a website was launched in 2005. [10]
A review published of the project in Computers and the Humanities described it as "as innovative and fascinating a reference work as the technology that made it possible." [11] In 1993, the project was described as the "second largest printed work of this century". The same review called it "excessive" and asked what its purpose was, going on to describe it as "the most pedantic work ever written". [8] In 2020, The Economist described it as "the creation story of the digital humanities." [10] An article in Umanistica Digitale wrote that "the project developed for the first time, methods for dealing with unstructured language". [12] It influenced projects such as Key Word in Context. [12] The project is also sometimes listed as one of the earliest instances of an e-book. [13]
Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically activated by a mouse click, keypress set, or screen touch. Apart from text, the term "hypertext" is also sometimes used to describe tables, images, and other presentational content formats with integrated hyperlinks. Hypertext is one of the key underlying concepts of the World Wide Web, where Web pages are often written in the Hypertext Markup Language (HTML). As implemented on the Web, hypertext enables the easy-to-use publication of information over the Internet.
Lexicography is the study of lexicons and the art of compiling dictionaries. It is divided into two separate academic disciplines:
Corpus linguistics is an empirical method for the study of language by way of a text corpus. Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.
The year 1951 in science and technology involved some significant events, listed below.
The Summa contra Gentiles is one of the best-known treatises by Thomas Aquinas, written as four books between 1259 and 1265.
Josephine Louise Miles was an American poet and literary critic; the first woman tenured in the English department at the University of California, Berkeley. She wrote over a dozen books of poetry and several works of criticism. She was a foundational scholar of quantitative and computational methods, and is considered a pioneer of the field of digital humanities. Benjamin H. Lehman and Josephine Miles' interdepartmental "Prose Improvement Project" was the basis for James Gray's Bay Area Writing Project, which later became the National Writing Project. The "Prose Improvement Project" was one of the first efforts at creating a writing across the curriculum program.
Roberto Busa was an Italian Jesuit priest and one of the pioneers in the usage of computers for linguistic and literary analysis. He was the author of the Index Thomisticus, a complete lemmatization of the works of Saint Thomas Aquinas and of a few related authors.
A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context. Historically, concordances have been compiled only for works of special importance, such as the Vedas, Bible, Qur'an or the works of Shakespeare, James Joyce or classical Latin and Greek authors, because of the time, difficulty, and expense involved in creating a concordance in the pre-computer era.
Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analysis of their application. DH can be defined as new ways of doing scholarship that involve collaborative, transdisciplinary, and computationally engaged research, teaching, and publishing. It brings digital tools and methods to the study of the humanities with the recognition that the printed word is no longer the main medium for knowledge production and distribution.
John of St. Thomas, O.P., born João Poinsot, was a Portuguese Dominican friar, Thomist theologian, and professor of philosophy. He is known for being an early theorist in the field of semiotics.
Digital classics is the application of the tools of digital humanities to the field of classics, or more broadly to the study of the ancient world.
Digital history is the use of digital media to further historical analysis, presentation, and research. It is a branch of the digital humanities and an extension of quantitative history, cliometrics, and computing. Digital history is commonly digital public history, concerned primarily with engaging online audiences with historical content, or, digital research methods, that further academic research. Digital history outputs include: digital archives, online presentations, data visualizations, interactive maps, timelines, audio files, and virtual worlds to make history more accessible to the user. Recent digital history projects focus on creativity, collaboration, and technical innovation, text mining, corpus linguistics, network analysis, 3D modeling, and big data analysis. By utilizing these resources, the user can rapidly develop new analyses that can link to, extend, and bring to life existing histories.
An ebook, also spelled as e-book or eBook, is a book publication made available in electronic form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. Although sometimes defined as "an electronic version of a printed book", some e-books exist without a printed equivalent. E-books can be read on dedicated e-reader devices, also on any computer device that features a controllable viewing screen, including desktop computers, laptops, tablets and smartphones.
The Alliance of Digital Humanities Organizations (ADHO) is a digital humanities umbrella organization formed in 2005 to coordinate the activities of several regional DH organizations, referred to as constituent organizations.
The Chinese Text Project is a digital library project that assembles collections of early Chinese texts. The name of the project in Chinese literally means "The Chinese Philosophical Book Digitization Project", showing its focus on books related to Chinese philosophy. It aims at providing accessible and accurate versions of a wide range of texts, particularly those relating to Chinese philosophy, and the site is credited with providing one of the most comprehensive and accurate collections of classical Chinese texts on the Internet, as well as being one of the most useful textual databases for scholars of early Chinese texts.
COCOA was an early text file utility and associated file format for digital humanities, then known as humanities computing. It was approximately 4000 punched cards of FORTRAN and created in the late 1960s and early 1970s at University College London and the Atlas Computer Laboratory in Harwell, Oxfordshire. Functionality included word-counting and concordance building.
Susan Hockey is an English computer scientist. She is Emeritus Professor of Library and Information Studies at University College London. She has written about the history of digital humanities, the development of text analysis applications, electronic textual mark-up, teaching computing in the humanities, and the role of libraries in managing digital resources. In 2014, University College London created a Digital Humanities lecture series in her honour.
Voyant Tools is an open-source, web-based application for performing text analysis. It supports scholarly reading and interpretation of texts or corpus, particularly by scholars in the digital humanities, but also by students and the general public. It can be used to analyze online texts or ones uploaded by users. Voyant has a large, international user base: in October 2016 alone, Voyant's main server had 81,686 page views originating from 156 countries, invoking the tool 1,173,252 times.
The Oxford Concordance Program (OCP) was first released in 1981 and was a result of a project started in 1978 by Oxford University Computing Services (OUCS) to create a machine independent text analysis program for producing word lists, indexes and concordances in a variety of languages and alphabets.
Lou Burnard is an internationally recognised expert in digital humanities, particularly in the area of text encoding and digital libraries. He was assistant director of Oxford University Computing Services (OUCS) from 2001 to September 2010, when he officially retired from OUCS. Before that, he was manager of the Humanities Computing Unit at OUCS for five years. He has worked in ICT support for research in the humanities since the 1990s. He was one of the founding editors of the Text Encoding Initiative (TEI) and continues to play an active part in its maintenance and development, as a consultant to the TEI Technical Council and as an elected TEI board member. He has played a key role in the establishment of many other activities and initiatives in this area, such as the UK Arts and Humanities Data Service and the British National Corpus, and has published and lectured widely. Since 2008 he has worked as a Member of the Conseil Scientifique for the CNRS-funded "Adonis" TGE.