Rhetorical structure theory

Last updated

Rhetorical structure theory (RST) is a theory of text organization that describes relations that hold between parts of text. It was originally developed by William Mann, Sandra Thompson, Christian M. I. M. Matthiessen and others at the University of Southern California's Information Sciences Institute (ISI) and defined in a 1988 paper. [1] [2] [3] The theory was developed as part of studies of computer-based text generation. Natural language researchers later began using RST in text summarization and other applications. It explains coherence by postulating a hierarchical, connected structure of texts. [3] In 2000, Daniel Marcu, also of ISI, demonstrated that practical discourse parsing and text summarization also could be achieved using RST. [4] [5] [6]

Contents

Rhetorical relations

Rhetorical relations or coherence relations or discourse relations are paratactic (coordinate) or hypotactic (subordinate) relations that hold across two or more text spans. [7] It is widely accepted that notion of coherence is through text relations like this. RST using rhetorical relations provide a systematic way for an analyst to analyse the text. An analysis is usually built by reading the text and constructing a tree using the relations. The following example is a title and summary, appearing at the top of an article in Scientific American magazine (Ramachandran and Anstis, 1986). The original text, broken into numbered units, is: [3]

Diagram of RST analysis Rhetorical relation example.png
Diagram of RST analysis
  1. [Title:] The Perception of Apparent Motion
  2. [Abstract:] When the motion of an intermittently seen object is ambiguous
  3. the visual system resolves confusion
  4. by applying some tricks that reflect a built-in knowledge of properties of the physical world

In the figure, numbers 1,2,3,4 show the corresponding units as explained above. The fourth unit and the third unit form a relation "Means". The third unit is the essential part of this relation, so it is called the nucleus of the relation and fourth unit is called the satellite of the relation. Similarly second unit to third and fourth unit is forming relation "Condition". All units are also spans and spans may be composed of more than one unit.

Nuclearity in discourse

RST establishes two different types of units. Nuclei are considered as the most important parts of text whereas satellites contribute to the nuclei and are secondary. Nucleus contains basic information and satellite contains additional information about nucleus. The satellite is often incomprehensible without nucleus, whereas a text where a satellites have been deleted can be understood to a certain extent.

Hierarchy in the analysis

RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels.

Why RST?

  1. From linguistic point of view, RST proposes a different view of text organization than most linguistic theories.
  2. RST points to a tight relation between relations and coherence in text
  3. From a computational point of view, it provides a characterization of text relations that has been implemented in different systems and for applications as text generation [8] and summarization. [9]

In design rationale

Computer scientists Ana Cristina Bicharra Garcia and Clarisse Sieckenius de Souz have used RST as the basis of a design rationale system called ADD+. [10] [11] In ADD+, RST is used as the basis for the rhetorical organization of a knowledge base, in a way comparable to other knowledge representation systems such as issue-based information system (IBIS). [11] Similarly, RST has been used in representation schemes for argumentation. [12] [13] [14]

See also

Related Research Articles

Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

<span class="mw-page-title-main">Rhetoric</span> Art of persuasion

Rhetoric is the art of persuasion. It is one of the three ancient arts of discourse (trivium) along with grammar and logic/dialectic. As an academic discipline within the humanities, rhetoric aims to study the techniques that speakers or writers use to inform, persuade, and motivate their audiences. Rhetoric also provides heuristics for understanding, discovering, and developing arguments for particular situations.

<span class="mw-page-title-main">Semantic network</span> Knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

<span class="mw-page-title-main">Discourse</span> Field of theory which examines elements of conversation

Discourse is a generalization of the notion of a conversation to any form of communication. Discourse is a major topic in social theory, with work spanning fields such as sociology, anthropology, continental philosophy, and discourse analysis. Following work by Michel Foucault, these fields view discourse as a system of thought, knowledge, or communication that constructs our world experience. Since control of discourse amounts to control of how the world is perceived, social theory often studies discourse as a window into power. Within theoretical linguistics, discourse is understood more narrowly as linguistic information exchange and was one of the major motivations for the framework of dynamic semantics. In these expressions, ' denotations are equated with their ability to update a discourse context.

Lexical functional grammar (LFG) is a constraint-based grammar framework in theoretical linguistics. It posits two separate levels of syntactic structure, a phrase structure grammar representation of word order and constituency, and a representation of grammatical functions such as subject and object, similar to dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the theory of transformational grammar which was current in the late 1970s. It mainly focuses on syntax, including its relation with morphology and semantics. There has been little LFG work on phonology.

Critical discourse analysis (CDA) uncovers the hidden meanings embedded in texts and conversations. It analyses the way the language used reinforces power relationships, social hierarchies, and ideologies.

<span class="mw-page-title-main">Visual rhetoric</span> Communication through visual elements

Visual rhetoric is the art of effective communication through visual elements such as images, typography, and texts. Visual rhetoric encompasses the skill of visual literacy and the ability to analyze images for their form and meaning. Drawing on techniques from semiotics and rhetorical analysis, visual rhetoric expands on visual literacy as it examines the structure of an image with the focus on its persuasive effects on an audience.

<span class="mw-page-title-main">Discourse analysis</span> Generic term for the analysis of social, language policy or historiographical discourse phenomena

Discourse analysis (DA), or discourse studies, is an approach to the analysis of written, spoken, or sign language, including any significant semiotic event.

In linguistics, focus is a grammatical category that conveys which part of the sentence contributes new, non-derivable, or contrastive information. In the English sentence "Mary only insulted BILL", focus is expressed prosodically by a pitch accent on "Bill" which identifies him as the only person whom Mary insulted. By contrast, in the sentence "Mary only INSULTED Bill", the verb "insult" is focused and thus expresses that Mary performed no other actions towards Bill. Focus is a cross-linguistic phenomenon and a major topic in linguistics. Research on focus spans numerous subfields including phonetics, syntax, semantics, pragmatics, and sociolinguistics.

Text linguistics is a branch of linguistics that deals with texts as communication systems. Its original aims lay in uncovering and describing text grammars. The application of text linguistics has, however, evolved from this approach to a point in which text is viewed in much broader terms that go beyond a mere extension of traditional grammar towards an entire text. Text linguistics takes into account the form of a text, but also its setting, i. e. the way in which it is situated in an interactional, communicative context. Both the author of a text as well as its addressee are taken into consideration in their respective roles in the specific communicative context. In general it is an application of discourse analysis at the much broader level of text, rather than just a sentence or word.

<span class="mw-page-title-main">Systemic functional linguistics</span> Approach that considers language as a social semiotic system

Systemic functional linguistics (SFL) is an approach to linguistics, among functional linguistics, that considers language as a social semiotic system.

Narrative paradigm is a communication theory conceptualized by 20th-century communication scholar Walter Fisher. The paradigm claims that all meaningful communication occurs via storytelling or reporting of events. Humans participate as storytellers and observers of narratives. This theory further claims that stories are more persuasive than arguments. Essentially the narrative paradigm helps us to explain how humans are able to understand complex information through narrative.

Frame semantics is a theory of linguistic meaning developed by Charles J. Fillmore that extends his earlier case grammar. It relates linguistic semantics to encyclopedic knowledge. The basic idea is that one cannot understand the meaning of a single word without access to all the essential knowledge that relates to that word. For example, one would not be able to understand the word "sell" without knowing anything about the situation of commercial transfer, which also involves, among other things, a seller, a buyer, goods, money, the relation between the money and the goods, the relations between the seller and the goods and the money, the relation between the buyer and the goods and the money and so on. Thus, a word activates, or evokes, a frame of semantic knowledge relating to the specific concept to which it refers.

Jerry R. Hobbs is an American researcher in the fields of computational linguistics, discourse analysis, and artificial intelligence.

<span class="mw-page-title-main">Treebank</span> Text corpus with tree annotations

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

<span class="mw-page-title-main">Rhetoric of science</span> Body of scholarly literature

Rhetoric of science is a body of scholarly literature exploring the notion that the practice of science is a rhetorical activity. It emerged after a number of similarly oriented topics of research and discussion during the late 20th century, including the sociology of scientific knowledge, history of science, and philosophy of science, but it is practiced most typically by rhetoricians in academic departments of English, speech, and communication.

RST may refer to:

The sequence between semantic related ordered words is classified as a lexical chain. A lexical chain is a sequence of related words in writing, spanning narrow or wide context window. A lexical chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text. A lexical chain can provide a context for the resolution of an ambiguous term and enable disambiguation of concepts that the term represents.

A discourse relation is a description of how two segments of discourse are logically and/or structurally connected to one another.

William C. "Bill" Mann was a computer scientist and computational linguist, the originator of rhetorical structure theory (RST) and a president of the Association for Computational Linguistics (1987–1988). He is especially well known for his work in text generation.

References

  1. Mann, William C.; Thompson, Sandra A. (1988). "Rhetorical structure theory: toward a functional theory of text organization" (PDF). Text: Interdisciplinary Journal for the Study of Discourse. 8 (3): 243–281. doi:10.1515/text.1.1988.8.3.243. S2CID   60514661 . Retrieved 1 November 2017.
  2. Matthiessen, Christian M. I. M. (June 2005). "Remembering Bill Mann". Computational Linguistics . 31 (2): 161–171. doi: 10.1162/0891201054224002 . S2CID   19688915 . Retrieved 1 November 2017.
  3. 1 2 3 Taboada, Maite; Mann, William C. (June 2006). "Rhetorical structure theory: looking back and moving ahead" (PDF). Discourse Studies . 8 (3): 423–459. CiteSeerX   10.1.1.216.381 . doi:10.1177/1461445606061881. S2CID   2386531.
  4. Marcu, Daniel (2000). The theory and practice of discourse parsing and summarization. Cambridge, Mass.: MIT Press. ISBN   978-0262133722. OCLC   43811223.
  5. Carlson, Lynn; Marcu, Daniel; Okurowski, Mary Ellen (2003) [2001]. "Building a discourse-tagged corpus in the framework of rhetorical structure theory" (PDF). In Kuppevelt, Jan van; Smith, Ronnie W. (eds.). Current and new directions in discourse and dialogue. Text, speech, and language technology. Vol. 22. Dordrecht; Boston: Kluwer Academic Publishers. pp. 85–112. doi:10.1007/978-94-010-0019-2_5. ISBN   978-1402016141. OCLC   53097055.
  6. "Timeline". isi.edu. Information Sciences Institute . Retrieved 1 November 2017.
  7. Taboada, Maite (2009). "Implicit and explicit coherence relations" (PDF). In Renkema, Jan (ed.). Discourse, of course: an overview of research in discourse studies. Amsterdam; Philadelphia: John Benjamins Publishing Company. pp. 127–140. doi:10.1075/z.148.13tab. ISBN   9789027232588. OCLC   276996573.
  8. "RST and text generation". ccl.pku.edu.cn. Retrieved 1 November 2017.
  9. Uzêda, Vinícius Rodrigues; Pardo, Thiago Alexandre Salgueiro; Nunes, Maria das Graças Volpe (November 2008). "Evaluation of automatic text summarization methods based on rhetorical structure theory" (PDF). Eighth International Conference on Intelligent Systems Design and Applications: Kaohsiung, Taiwan, 26–28 November 2008. ISDA'08. Vol. 2. Piscataway, NJ: IEEE. pp. 389–394. doi:10.1109/ISDA.2008.289. ISBN   978-0-7695-3382-7. S2CID   16331006 . Retrieved 1 November 2017.
  10. Garcia, Ana Cristina Bicharra; Souz, Clarisse Sieckenius de (April 1997). "ADD+: Including rhetorical structures in active documents" (PDF). Artificial Intelligence for Engineering Design, Analysis and Manufacturing. 11 (2): 109–124. doi: 10.1017/S0890060400001906 .
  11. 1 2 Regli, William C.; Hu, Xiaochun; Atwood, Michael; Sun, Wei (December 2000). "A survey of design rationale systems: approaches, representation, capture and retrieval" (PDF). Engineering with Computers. 16 (3–4): 209–235. doi:10.1007/PL00013715. S2CID   6394458.
  12. Green, Nancy L. (August 2009). "Representation of argumentation in text with rhetorical structure theory". Argumentation. 24 (2): 181–196. doi:10.1007/s10503-009-9169-4. S2CID   145388742.
  13. Green, Nancy L. (November 2015). "Annotating evidence-based argumentation in biomedical text". 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, USA, 9–12 November 2015. Piscataway, NJ: IEEE. pp. 922–929. doi:10.1109/BIBM.2015.7359807. ISBN   978-1-4673-6799-8. OCLC   972619754. S2CID   7821394.
  14. Mitrović, Jelena; O'Reilly, Cliff; Mladenović, Miljana; Handschuh, Siegfried (January 2017). "Ontological representations of rhetorical figures for argument mining". Argument & Computation. 8 (3): 267–287. doi: 10.3233/AAC-170027 .