Distant reading

Last updated

Distant reading is an approach in literary studies that applies computational methods to literary data, usually derived from large digital libraries, for the purposes of literary history and theory. While the term is collective, and is used to refer to a range of different computational methods of analysing literary data, similar approaches also include macroanalysis, cultural analytics, computational formalism, computational literary studies, quantitative literary studies, and algorithmic literary criticism.

Contents

History

The term "distant reading" is generally attributed to Franco Moretti and his 2000 article, Conjectures on World Literature. [1] In the article, Moretti proposed a mode of reading which included works outside of established literary canons, which he variously termed "the great unread" [2] and, elsewhere, "the Slaughterhouse of Literature". [3] The innovation it proposed, as far as literary studies was concerned, was that the method employed samples, statistics, paratexts, and other features not often considered within the ambit of literary analysis. Moretti also established a direct opposition to the theory and methods of close reading: "One thing for sure: it cannot mean the very close reading of very few texts—secularized theology, really ('canon'!)—that has radiated from the cheerful town of New Haven over the whole field of literary studies". [4]

However, Moretti initially conceived distant reading for analysis of secondary literature as a roundabout way of getting to know more about primary literature: "[literary history] will become 'second-hand': a patchwork of other people's research, without a single direct textual reading". [2] Only later did the term distant reading (via Moretti and other scholars) come to become primarily identified with computational analysis of primary literary sources.

Despite the consensus about the origins of distant reading at the turn of the twenty-first century, Ted Underwood has traced a longer genealogy of the method, arguing for its elision in current discourse about distant reading. He writes that "distant reading has a largely distinct genealogy stretching back many decades before the advent of the internet—a genealogy that is not for the most part centrally concerned with computers". [5] Underwood emphasises a social-scientific dimension in this prehistory of distant reading, referring to particular examples in the work of Raymond Williams (from the 1960s) and Janice Radway (from the 1980s). Moretti’s conception of literary evolution in Distant Reading is quite similar to the psychologist Colin Martindale’s (Clockwork Muse, 1990) "scientific", computational, neo-Darwinist project of literary evolution, and the role of reading is downplayed by both Martindale and Moretti. According to Martindale, the principles of the evolution of art are based on statistic regularities rather than meaning, data or observation. "So far as the engines of history are concerned, meaning does not matter. In principle, one could study the history of a literary tradition without reading any of literature. ... the main virtue of the computerized content analysis methods I use is that they save one from actually having to read the literature" (p. 14).

This variety in the stated definitions and aims of distant reading is characteristic of its development since the turn of the twenty-first century, where it has come to encompass a variety of different methods and approaches, rather than representing a single or unified method of literary study.

Principles and practice

One of the central principles of distant reading is that literary history and literary criticism can be written without necessarily resorting to the kind of careful, sustained reading encounter with individual texts that is fundamental to close reading.

Commonly, distant reading is performed at scale, using a large collection of texts. However, some scholars have adopted the principles of distant reading in the analysis of a small number of texts or an individual text. [6] Distant reading often shares with the Annales school a focus on the analysis of long-term histories and trends. Empirical approaches to literary study are a regular characteristic of distant reading, and are often accompanied by a reliance on quantitative methods. Moretti has described the concept of 'operationalizing' as "absolutely central to the new field of computational criticism" [7] that includes distant reading. This principle, for Moretti, consists of "building a bridge from concepts to measurement, and then to the world" (104), underscoring the combined interests of empirical and quantitative study at its heart. In practice, distant reading has been undertaken with the aid of computers in the twenty-first century (though Underwood has argued for prominent non-computational precursors [8] ); however, some works combining scale and literary study have been described as "distant-reading-by-hand". [9]

Criticisms of distant reading

Stanley Fish takes a broad view of what he frames as problems of interpretation in the digital humanities, but the specific example he isolates for critique is informed by his impression of distant reading methodology: "first you run the numbers, and then you see if they prompt an interpretive hypothesis. The method, if it can be called that, is dictated by the capability of the tool". [10] In a similar vein, Stephen Marche focuses on the prospects for interpretation within the framework of computational literary analysis in an article which begins with the provocation, "[b]ig data is coming for your books". [11] Though he initially described distant reading as the "most promising path, at least on the surface" [11] of a range of Digital Humanities methods he surveys, he concludes that the generalisations he perceives in the method are ineffective when "applied to literary questions proper". [11] Additional critiques of distant reading have come from postcolonial theorists. Gayatri Spivak is unconvinced about distant reading's claims to represent the perspectives of the "great unread", asking "[s]hould our only ambition be to create authoritative totalizing patterns depending on untested statements by small groups of people treated as native informants?". [12] Jonathan Arac questions the "unavowed imperialism of English" [13] in Moretti's work.

Examples

In "Style, Inc. Reflections on Seven Thousand Titles (British Novels, 1740–1850)" [14] Franco Moretti uses an early distant reading methodology to analyse certain changes in the titles of novels in the given period and country. In the absence of dedicated corpora of these novels' texts, Moretti argues that "titles are still the best way to go beyond the 1 percent of novels that make up the canon, and catch a glimpse of the literary field as a whole". [14] In the article, Moretti combines the results of quantitative analysis of these titles with contextual knowledge of literary history to address questions about the shortening of eighteenth-century novel titles, about the nature of very short novel titles, and about the relationship of novel titles to genres. For examples, in Section I, he provides evidence of the decreasing length of titles across the time span, and links the phenomenon to the growth of the market for novels and the establishment of periodicals which regularly reviewed novels.

In 'Why Literary Time is Measured in Minutes" [15] Ted Underwood asks " Why are short spans of time so central to our discipline? ... Why is experience measured in seconds or minutes more appropriately literary than experience measured in weeks or months?". [16] Methodologically, Underwood supplements theoretical ideas about the compression of fictional time with approaches from distant reading which model the average lengths of time described in 250-word portions of fiction across three centuries. Having also combined quantitative findings with close reading, Underwood concludes his article with a discussion of the integration of quantitative methods into literary study, with the author suggesting that "I see close readings and statistical models not as competing epistemologies but as interlocking modes of interpretation that excel at different scales of analysis". [17]

In their Literary Lab pamphlet, "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method", [18] Ryan Heuser and Long Le-Khac analyse word usage within their corpus to argue for a "systemic concretization of language and fundamental change in the social spaces of the novel". [19] Their analysis demonstrates a change in the way in which concrete detail is presented across the span of the nineteenth century, with an observable shift in the novel's narrative style "from telling to showing" [20] as the century develops. The findings tally with many literary-critical writings about the change in nineteenth-century narrative style from realism to modernism.

Lauren F. Klein trains methods from computational linguistics and data visualisation on an archive of slavery, in her article, "The Image of Absence: Archival Silence, Data Visualization, and James Hemings", [21] in order to present examples of how distant reading can uncover and illuminate "the silences endemic to the archive of American slavery". [22] Searching for archival traces of James Hemings, Thomas Jefferson's enslaved chef, Klein juxtaposes visualisations of his presence with Jefferson's own charts and tables as the basis for a discussion of data visualisation as it relates to the construction of race.

The COST Action 'Distant Reading for European Literary History' [23] is a European networking project bringing together scholars interested in corpus building, quantitative text analysis, and European literary history. It aims to create a network of researchers jointly developing the distant reading resources and methods necessary to change the way European literary history is written. The objectives of the project include coordinating the creation of a multilingual European Literary Text Collection (ELTeC) [24] containing digital full-texts of novels in different European languages.

See also

Related Research Articles

In literary criticism, a Bildungsroman is a literary genre that focuses on the psychological and moral growth of the protagonist from childhood to adulthood, in which character change is important. The term comes from the German words Bildung and Roman ("novel").

Social science Academic disciplines concerned with society and the relationships between individuals in society

Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the original "science of society", established in the 19th century. In addition to sociology, it now encompasses a wide array of academic disciplines, including anthropology, archaeology, economics, human geography, linguistics, management science, political science and psychology.

Literary criticism Study, evaluation, and interpretation of literature

Literary criticism is the study, evaluation, and interpretation of literature. Modern literary criticism is often influenced by literary theory, which is the philosophical discussion of literature's goals and methods. Though the two activities are closely related, literary critics are not always, and have not always been, theorists.

Close reading Careful, sustained interpretation of a brief passage of a text

In literary criticism, close reading is the careful, sustained interpretation of a brief passage of a text. A close reading emphasizes the single and the particular over the general, effected by close attention to individual words, the syntax, the order in which the sentences unfold ideas, as well as formal structures. A truly attentive close reading means thinking about both what is being said in a passage, and how it is being said and leading it to possibilities for observation and insight.

Content analysis Research method for studying documents and communication artifacts

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is its non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

Josephine Miles American poet and academic (1911–1985)

Josephine Louise Miles was an American poet and literary critic; the first woman tenured in the English department at the University of California, Berkeley. She wrote over a dozen books of poetry and several works of criticism. She was a foundational scholar of quantitative and computational methods, and is considered a pioneer of the field of digital humanities.

Stylometry is the application of the study of linguistic style, usually to written language, but it has been applied successfully to music and to fine-art paintings as well. Another conceptualization defines it as the linguistic discipline that evaluates an author's style through the application of statistical analysis to a body of their work.

World literature Circulation of literature beyond its country of origin

World literature is used to refer to the total of the world's national literature and the circulation of works into the wider world beyond their country of origin. In the past, it primarily referred to the masterpieces of Western European literature; however, world literature today is increasingly seen in an international context. Now, readers have access to a wide range of global works in various translations.

Digital humanities Area of scholarly activity

Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analysis of their application. DH can be defined as new ways of doing scholarship that involve collaborative, transdisciplinary, and computationally engaged research, teaching, and publishing. It brings digital tools and methods to the study of the humanities with the recognition that the printed word is no longer the main medium for knowledge production and distribution.

Franco Moretti is an Italian literary historian and theorist. He graduated in Modern Literatures from the University of Rome in 1972. He has taught at the universities of Salerno (1979–1983) and Verona (1983–1990); in the US, at Columbia (1990–2000) and Stanford (2000–2016), where in 2000 he founded the Center for the Study of the Novel, and in 2010, with Matthew Jockers, the Stanford Literary Lab. Moretti has given the Gauss Seminars at Princeton, the Beckman Lectures at Berkeley, the Carpenter Lectures at the University of Chicago, and has been a lecturer and visiting professor in many countries, including, until the end of 2019, the Digital Humanities Institute at the École Polytechnique Fédérale de Lausanne.

Sociological criticism is literary criticism directed to understanding literature in its larger social context; it codifies the literary strategies that are employed to represent social constructs through a sociological methodology. Sociological criticism analyzes both how the social functions in literature and how literature works in society. This form of literary criticism was introduced by Kenneth Burke, a 20th-century literary and critical theorist, whose article "Literature As Equipment for Living" outlines the specification and significance of such a critique.

Sociology of literature

The sociology of literature is a subfield of the sociology of culture. It studies the social production of literature and its social implications. A notable example is Pierre Bourdieu's 1992 Les Règles de L'Art: Genèse et Structure du Champ Littéraire, translated by Susan Emanuel as Rules of Art: Genesis and Structure of the Literary Field (1996).

Literature Written work of art

Literature broadly is any collection of written work, but it is also used more narrowly for writings specifically considered to be an art form, especially prose fiction, drama, and poetry. In recent centuries, the definition has expanded to include oral literature, much of which has been transcribed. Literature is a method of recording, preserving, and transmitting knowledge and entertainment, and can also have a social, psychological, spiritual, or political role.

English studies is an academic discipline taught in primary, secondary, and post-secondary education in English-speaking countries; it is not to be confused with English taught as a foreign language, which is a distinct discipline. It involves the study and exploration of texts created in English literature. English studies include: the study of literature, the majority of which comes from Britain, the United States, and Ireland ; English composition, including writing essays, short stories, and poetry; English language arts, including the study of grammar, usage, and style; and English sociolinguistics, including discourse analysis of written and spoken texts in the English language, the history of the English language, English language learning and teaching, and the study of World Englishes. English linguistics is usually treated as a distinct discipline, taught in a department of linguistics.

Cultural analytics refers to the use of computational, visualization, and big data methods for the exploration of contemporary and historical cultures. While digital humanities research has focused on text data, cultural analytics has a particular focus on massive cultural data sets of visual material – both digitized visual artifacts and contemporary visual and interactive media. Taking on the challenge of how to best explore large collections of rich cultural content, cultural analytics researchers developed new methods and intuitive visual techniques that rely on high-resolution visualization and digital image processing. These methods are used to address both the existing research questions in humanities, to explore new questions, and to develop new theoretical concepts that fit the mega-scale of digital culture in the early 21st century.

Joseph Carroll is a scholar in the field of literature and evolution. He is a Curators’ Distinguished Professor at the University of Missouri–St. Louis, where he has taught since 1985.

Geocriticism is a method of literary analysis and literary theory that incorporates the study of geographic space. The term designates a number of different critical practices. In France, Bertrand Westphal has elaborated the concept of géocritique in several works. In the United States, Robert Tally has argued for a geocriticism as a critical practice suited to the analysis of what he has termed "literary cartography".

Jay Clayton (critic)

Jay Clayton is an American literary critic who is known for his pioneering work on the relationship between nineteenth-century culture and postmodernism. He has published influential works on Romanticism and the novel, Neo-Victorian literature, steampunk, hypertext fiction, online games, contemporary American fiction, technology in literature, and genetics in literature and film. He is the William R. Kenan, Jr. Professor of English and Director of the Curb Center for Art, Enterprise, and Public Policy at Vanderbilt University.

The Maryland Institute for Technology in the Humanities (MITH) is an international research center that works with humanities in the 21st century. A collaboration among the University of Maryland College of Arts and Humanities, Libraries, and Office of Information Technology, MITH cultivates innovative research agendas clustered around digital tools, text mining and visualization, and the creation and preservation of electronic literature, digital games, virtual worlds.

Lauren Klein American academic

Lauren Klein is an American academic who works as an associate professor, and director of the Digital Humanities Lab at Emory University. Klein is best known for her work in digital humanities and for co-authoring the book Data Feminism with Catherine D'Ignazio.

References

  1. Moretti, Franco (2000). "Conjectures on World Literature". New Left Review. 1.
  2. 1 2 Moretti, Franco (2000). "Conjectures on World Literature". New Left Review. 1: 55.
  3. Moretti, Franco (2000). "The Slaughterhouse of Literature". Modern Language Quarterly. 61 (1): 207. doi:10.1215/00267929-61-1-207. S2CID   161329715.
  4. Moretti, Franco (2000). "The Slaughterhouse of Literature". Modern Language Quarterly. 61 (1): 208. doi:10.1215/00267929-61-1-207. S2CID   161329715.
  5. Underwood, Ted (2017). "A Genealogy of Distant Reading". Digital Humanities Quarterly. 11 (2).
  6. Eve, Martin Paul (2017). "Close Reading with Computers: Genre Signals, Parts of Speech, and David Mitchell's Cloud Atlas". SubStance. 46 (3). doi:10.3368/ss.46.3.76. S2CID   54614638.
  7. Moretti, Franco (2013). "'Operationalizing': Or, the Function of Measurement in Literary Theory". New Left Review. 84: 103.
  8. Underwood, Ted (2017). "A Genealogy of Distant Reading". Digital Humanities Quarterly. 11 (2).
  9. Pasanek, Brad (2015). Metaphors of Mind: An Eighteenth-Century Dictionary. Baltimore: Johns Hopkins University Press. ISBN   9781421416885.
  10. Fish, Stanley (23 Jan 2012). "Mind Your P's and B's: The Digital Humanities and Interpretation". New York Times.
  11. 1 2 3 Marche, Stephen (28 Oct 2012). "Literature Is not Data: Against Digital Humanities". Los Angeles Review of Books.
  12. Spivak, Gayatri Chakravorty (2005). Death of a Discipline. Columbia University Press. pp. 107–8. ISBN   9780231129459.
  13. Arac, Jonathan (2002). "Anglo-Globalism?". New Left Review. 16: 44.
  14. 1 2 Moretti, Franco (2009). "Style, Inc. Reflections on Seven Thousand Titles (British Novels, 1740–1850)". Critical Inquiry. 36 (1): 134–158. doi:10.1086/605619. JSTOR   10.1086/606125.
  15. Underwood, Ted (2018). "Why Literary Time is Measured in Minutes". ELH. 85 (2): 341–365. doi:10.1353/elh.2018.0013. hdl: 2142/100076 . S2CID   192215143.
  16. Underwood, Ted (2018). "Why Literary Time is Measured in Minutes". ELH. 85 (2): 342. doi:10.1353/elh.2018.0013. hdl: 2142/100076 . S2CID   192215143.
  17. Underwood, Ted (2018). "Why Literary Time is Measured in Minutes". ELH. 85 (2): 363. doi:10.1353/elh.2018.0013. hdl: 2142/100076 . S2CID   192215143.
  18. Heuser, Ryan; Le-Khac, Long (2012). "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method" (PDF). Pamphlets of the Stanford Literary Lab. 4.
  19. Heuser, Ryan; Le-Khac, Long (2012). "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method" (PDF). Pamphlets of the Stanford Literary Lab. 4: 2.
  20. Heuser, Ryan; Le-Khac, Long (2012). "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method" (PDF). Pamphlets of the Stanford Literary Lab. 4: 45.
  21. Klein, Lauren F. (2013). "The Image of Absence: Archival Silence, Data Visualization, and James Hemings". American Literature. 85 (4): 661–688. doi:10.1215/00029831-2367310.
  22. Klein, Lauren F. (2013). "The Image of Absence: Archival Silence, Data Visualization, and James Hemings". American Literature. 85 (4): 661. doi:10.1215/00029831-2367310.
  23. "Distant Reading for European Literary History". Distant Reading.
  24. "ELTeC: European Literary Text Collection". Distant Reading.