Coh-Metrix

Last updated

Coh-Metrix is a computational tool that produces indices of the linguistic and discourse representations of a text. Developed by Arthur C. Graesser and Danielle S. McNamara, Coh-Metrix analyzes texts on many different features.

Contents

Measurements

Coh-Metrix can be used in many different ways to investigate the cohesion of the explicit text and the coherence of the mental representation of the text. "Our definition of cohesion consists of characteristics of the explicit text that play some role in helping the reader mentally connect ideas in the text" (Graesser, McNamara, & Louwerse, 2003). The definition of coherence is the subject of much debate. Theoretically, the coherence of a text is defined by the interaction between linguistic representations and knowledge representations. While coherence can be defined as characteristics of the text (i.e., aspects of cohesion) that are likely to contribute to the coherence of the mental representation, Coh-Metrix measurements provide indices of these cohesion characteristics. [1]

According to an empirical study, the Coh-Metrix L2 Reading Index performs significantly better than traditional readability formulas. [2]

See also

Related Research Articles

<span class="mw-page-title-main">Concept</span> Mental representation or an abstract object

Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks underlying principles, thoughts and beliefs. They play an important role in all aspects of cognition. As such, concepts are studied by several disciplines, such as linguistics, psychology, and philosophy, and these disciplines are interested in the logical and psychological structure of concepts, and how they are put together to form thoughts and sentences. The study of concepts has served as an important flagship of an emerging interdisciplinary approach called cognitive science.

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content and its presentation. Researchers have used various factors to measure readability, such as:

In mathematics, more specifically in group theory, the character of a group representation is a function on the group that associates to each group element the trace of the corresponding matrix. The character carries the essential information about the representation in a more condensed form. Georg Frobenius initially developed representation theory of finite groups entirely based on the characters, and without any explicit matrix realization of representations themselves. This is possible because a complex representation of a finite group is determined by its character. The situation with representations over a field of positive characteristic, so-called "modular representations", is more delicate, but Richard Brauer developed a powerful theory of characters in this case as well. Many deep theorems on the structure of finite groups use characters of modular representations.

Intertextuality is the shaping of a text's meaning by another text, either through deliberate compositional strategies such as quotation, allusion, calque, plagiarism, translation, pastiche or parody, or by interconnections between similar or related works perceived by an audience or reader of the text. These references are sometimes made deliberately and depend on a reader's prior knowledge and understanding of the referent, but the effect of intertextuality is not always intentional and is sometimes inadvertent. Often associated with strategies employed by writers working in imaginative registers, intertextuality may now be understood as intrinsic to any text.

<span class="mw-page-title-main">Academic writing</span> Writing resulting from academic work

Academic writing or scholarly writing is nonfiction writing produced as part of academic work in accordance with the standards and disciplines of each academic subject, including reports on empirical fieldwork or research in facilities for the natural sciences or social sciences, monographs in which scholars analyze culture, propose new theories, or develop interpretations from archives, as well as undergraduate versions of all of these.

Text linguistics is a branch of linguistics that deals with texts as communication systems. Its original aims lay in uncovering and describing text grammars. The application of text linguistics has, however, evolved from this approach to a point in which text is viewed in much broader terms that go beyond a mere extension of traditional grammar towards an entire text. Text linguistics takes into account the form of a text, but also its setting, i. e. the way in which it is situated in an interactional, communicative context. Both the author of a text as well as its addressee are taken into consideration in their respective roles in the specific communicative context. In general it is an application of discourse analysis at the much broader level of text, rather than just a sentence or word.

Coherence in linguistics is what makes a text semantically meaningful. It is especially dealt with in text linguistics. Coherence is achieved through syntactical features such as the use of deictic, anaphoric and cataphoric elements or a logical tense structure, as well as presuppositions and implications connected to general world knowledge. The purely linguistic elements that make a text coherent are subsumed under the term cohesion.

<span class="mw-page-title-main">Representation (arts)</span> Signs that stand in for and take the place of something else

Representation is the use of signs that stand in for and take the place of something else. It is through representation that people organize the world and reality through the act of naming its elements. Signs are arranged in order to form semantic constructions and express relations.

The sequence between semantic related ordered words is classified as a lexical chain. A lexical chain is a sequence of related words in writing, spanning short or long distances. A chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text. A lexical chain can provide a context for the resolution of an ambiguous term and enable identification of the concept that the term represents.

In mathematical logic, the De Bruijn index is a tool invented by the Dutch mathematician Nicolaas Govert de Bruijn for representing terms of lambda calculus without naming the bound variables. Terms written using these indices are invariant with respect to α-conversion, so the check for α-equivalence is the same as that for syntactic equality. Each De Bruijn index is a natural number that represents an occurrence of a variable in a λ-term, and denotes the number of binders that are in scope between that occurrence and its corresponding binder. The following are some examples:

In mathematics, the equivariant algebraic K-theory is an algebraic K-theory associated to the category of equivariant coherent sheaves on an algebraic scheme X with action of a linear algebraic group G, via Quillen's Q-construction; thus, by definition,

Multiliteracy is an approach to literacy theory and pedagogy coined in the mid-1990s by the New London Group. The approach is characterized by two key aspects of literacy - linguistic diversity and multimodal forms of linguistic expressions and representation. It was coined in response to two major changes in the globalized environment. One such change was the growing linguistic and cultural diversity due to increased transnational migration. The second major change was the proliferation of new mediums of communication due to advancement in communication technologies e.g the internet, multimedia, and digial media. As a scholarly approach, multiliteracy focuses on the new "literacy" that is developing in response to the changes in the way people communicate globally due to technological shifts and the interplay between different cultures and languages.

A discourse relation is a description of how two segments of discourse are logically and/or structurally connected to one another.

<span class="mw-page-title-main">Lexile</span> Educational tool for measuring readability level

The Lexile Framework for Reading is an educational tool that uses a measure called a Lexile to match readers with books, articles and other leveled reading resources. Readers and books are assigned a score on the Lexile scale, in which lower scores reflect easier readability for books and lower reading ability for readers. The Lexile framework uses quantitative methods, based on individual words and sentence lengths, rather than qualitative analysis of content to produce scores. Accordingly, the scores for texts do not reflect factors such as multiple levels of meaning or maturity of themes. Hence, the United States Common Core State Standards recommend the use of alternative, qualitative methods for selecting books for students at grade 6 and over. In the US, Lexile measures are reported from reading programs and assessments annually. Thus, about half of U.S. students in grades 3rd through 12th receive a Lexile measure each year. In addition to being used in schools in all 50 states, Lexile measures are also used outside of the United States.

Literariness is the organisation of language which through special linguistic and formal properties distinguishes literary texts from non-literary texts. The defining features of a literary work do not reside in extraliterary conditions such as history or sociocultural phenomena under which a literary text might have been created but in the form of the language that is used. Thus, literariness is defined as being the feature that makes a given work a literary work. It distinguishes a literary work from ordinary texts by using certain artistic devices such as metre, rhyme, and other patterns of sound and repetition.

<span class="mw-page-title-main">Rhetorical structure theory</span>

Rhetorical structure theory (RST) is a theory of text organization that describes relations that hold between parts of text. It was originally developed by William Mann, Sandra Thompson, Christian M.I.M. Matthiessen and others at the University of Southern California's Information Sciences Institute (ISI) and defined in a 1988 paper. The theory was developed as part of studies of computer-based text generation. Natural language researchers later began using RST in text summarization and other applications. It explains coherence by postulating a hierarchical, connected structure of texts. In 2000, Daniel Marcu, also of ISI, demonstrated that practical discourse parsing and text summarization also could be achieved using RST.

Arthur C. Graesser is a professor of psychology and intelligent systems at the University of Memphis and is an honorary research fellow in education at the University of Oxford.

Scott Andrew Crossley is an American linguist. He is a professor of applied linguistics at Vanderbilt University, United States. His research focuses on natural language processing and the application of computational tools and machine learning algorithms in learning analytics including second language acquisition, second language writing, and readability. His main interest area is the development and use of natural language processing tools in assessing writing quality and text difficulty.

Danielle S. McNamara is an educational researcher known for her theoretical and empirical work with reading comprehension and the development of game-based literacy technologies. She is Professor of Psychology and Senior Research Scientist at Arizona State University. She has previously held positions at University of Memphis, Old Dominion University, and University of Colorado, Boulder.

References

  1. Graesser, A.C., McNamara, D.S., & Louwerse, M.M (2003). What do readers need to learn in order to process coherence relations in narrative and expository text. In A.P. Sweet and C.E. Snow (Eds.), Rethinking reading comprehension (pp. 82–98). New York: Guilford Publications.
  2. Scott Crossley; David Allen; Danielle McNamara (2011). "Text readability and intuitive simplification: A comparison of readability formulas N1 English". Reading in a Foreign Language. 23 (1): 86–101. ISSN   1539-0578.