Lexical chain

Last updated

The sequence between semantic related ordered words is classified as a lexical chain. [1] A lexical chain is a sequence of related words in writing, spanning narrow (adjacent words or sentences) or wide context window (entire text). A lexical chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text. A lexical chain can provide a context for the resolution of an ambiguous term and enable disambiguation of concepts that the term represents.

Contents

About

Morris and Hirst [1] introduce the term lexical chain as an expansion of lexical cohesion. [2] A text in which many of its sentences are semantically connected often produces a certain degree of continuity in its ideas, providing good cohesion among its sentences. The definition used for lexical cohesion states that coherence is a result of cohesion, not the other way around. [2] [3] Cohesion is related to a set of words that belong together because of abstract or concrete relation. Coherence, on the other hand, is concerned with the actual meaning in the whole text. [1]

Morris and Hirst [1] define that lexical chains make use of semantic context for interpreting words, concepts, and sentences. In contrast, lexical cohesion is more focused on the relationships of word pairs. Lexical chains extend this notion to a serial number of adjacent words. There are two main reasons why lexical chains are essential: [1]

The method presented by Morris and Hirst [1] is the first to bring the concept of lexical cohesion to computer systems via lexical chains. Using their intuition, they identify lexical chains in text documents and built their structure considering Halliday and Hassan's [2] observations. For this task, they considered five text documents, totaling 183 sentences from different and non-specific sources. Repetitive words (e.g., high-frequency words, pronouns, propositions, verbal auxiliaries) were not considered as prospective chain elements since they do not bring much semantic value to the structure themselves.

Lexical chains are built according to a series of relationships between words in a text document. In the seminal work of Morris and Hirst [1] they consider an external thesaurus (Roget's Thesaurus) as their lexical database to extract these relations. A lexical chain is formed by a sequence of words appearing in this order, such that any two consecutive words present the following properties (i.e., attributes such as category, indexes, and pointers in the lexical database): [1] [4]

Approaches and Methods

The use of lexical chains in natural language processing tasks (e.g., text similarity, word sense disambiguation, document clustering) has been widely studied in the literature. Barzilay et al [5] use lexical chains to produce summaries from texts. They propose a technique based on four steps: segmentation of original text, construction of lexical chains, identification of reliable chains, and extraction of significant sentences. Silber and McCoy [6] also investigates text summarization, but their approach for constructing the lexical chains runs in linear time.

Some authors use WordNet [7] [8] to improve the search and evaluation of lexical chains. Budanitsky and Kirst [9] [10] compare several measurements of semantic distance and relatedness using lexical chains in conjunction with WordNet. Their study concludes that the similarity measure of Jiang and Conrath [11] presents the best overall result. Moldovan and Adrian [12] study the use of lexical chains for finding topically related words for question answering systems. This is done considering the glosses for each synset in WordNet. According to their findings, topical relations via lexical chains improve the performance of question answering systems when combined with WordNet. McCarthy et al. [13] present a methodology to categorize and find the most predominant synsets in unlabeled texts using WordNet. Different from traditional approaches (e.g., BOW), they consider relationships between terms not occurring explicitly. Ercan and Cicekli [14] explore the effects of lexical chains in the keyword extraction task through a supervised machine learning perspective. In Wei et al. [15] combine lexical chains and WordNet to extract a set of semantically related words from texts and use them for clustering. Their approach uses an ontological hierarchical structure to provide a more accurate assessment of similarity between terms during the word sense disambiguation task.

Lexical Chain and Word Embedding

Even though the applicability of lexical chains is diverse, there is little work exploring them with recent advances in NLP, more specifically with word embeddings. In, [16] lexical chains are built using specific patterns found on WordNet [7] and used for learning word embeddings. Their resulting vectors, are validated in the document similarity task. Gonzales et al. [17] use word-sense embeddings to produce lexical chains that are integrated with a neural machine translation model. Mascarelli [18] proposes a model that uses lexical chains to leverage statistical machine translation by using a document encoder. Instead of using an external lexical database, they use word embeddings to detect the lexical chains in the source text.

Ruas et al. [4] propose two techniques that combine lexical databases, lexical chains, and word embeddings, namely Flexible Lexical Chain II (FLLC II) and Fixed Lexical Chain II (FXLC II). The main goal of both FLLC II and FXLC II is to represent a collection of words by their semantic values more concisely. In FLLC II, the lexical chains are assembled dynamically according to the semantic content for each term evaluated and the relationship with its adjacent neighbors. As long as there is a semantic relation that connects two or more words, they should be combined into a unique concept. The semantic relationship is obtained through WordNet, which works a ground truth to indicate which lexical structure connects two words (e.g., hypernyms, hyponyms, meronyms). If a word without any semantic affinity with the current chain presents itself, a new lexical chain is initialized. On the other hand, FXLC II breaks text segments into pre-defined chunks, with a specific number of words each. Different from FLLC II, the FXLC II technique groups a certain amount of words into the same structure, regardless of the semantic relatedness expressed in the lexical database. In both methods, each formed chain is represented by the word whose pre-trained word embedding vector is most similar to the average vector of the constituent words in that same chain.

See also

Related Research Articles

<span class="mw-page-title-main">Semantic network</span> Knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

<span class="mw-page-title-main">WordNet</span> Computational lexicon of English

WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

<span class="mw-page-title-main">Hyponymy and hypernymy</span> Semantic relations involving the type-of property

Hyponymy and hypernymy are semantic relations between a term belonging in a set that is defined by another term and the latter. In other words, the relationship of a subtype (hyponym) and the supertype. The semantic field of the hyponym is included within that of the hypernym. For example, pigeon, crow, and eagle are all hyponyms of bird, their hypernym.

Lexical semantics, as a subfield of linguistic semantics, is the study of word meanings. It includes the study of how words structure their meaning, how they act in grammar and compositionality, and the relationships between the distinct senses and uses of a word.

Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content and its presentation. Researchers have used various factors to measure readability, such as:

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

<span class="mw-page-title-main">Distributional semantics</span> Field of linguistics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.

Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning and information remain the same. Text simplification is an important area of research because of communication needs in an increasingly complex and interconnected world more dominated by science, technology, and new media. But natural human languages pose huge problems because they ordinarily contain large vocabularies and complex constructions that machines, no matter how fast and well-programmed, cannot easily process. However, researchers have discovered that, to reduce linguistic diversity, they can use methods of semantic compression to limit and simplify a set of words used in given texts.

In natural language processing, semantic role labeling is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result.

Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the match, replace any remaining fluid deficit to prevent chronic dehydration throughout the tournament", a substitute of game might be given.

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

In natural language processing, textual entailment (TE), also known as natural language inference (NLI), is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text.

In natural language processing, semantic compression is a process of compacting a lexicon used to build a textual document by reducing language heterogeneity, while maintaining text semantics. As a result, the same ideas can be represented using a smaller set of words.

<span class="mw-page-title-main">Word embedding</span> Method in natural language processing

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

<span class="mw-page-title-main">Word2vec</span> Models used to produce word embeddings

Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector. The vectors are chosen carefully such that they capture the semantic and syntactic qualities of words; as such, a simple mathematical function can indicate the level of semantic similarity between the words represented by those vectors.

UBY is a large-scale lexical-semantic resource for natural language processing (NLP) developed at the Ubiquitous Knowledge Processing Lab (UKP) in the department of Computer Science of the Technische Universität Darmstadt . UBY is based on the ISO standard Lexical Markup Framework (LMF) and combines information from several expert-constructed and collaboratively constructed resources for English and German.

Paraphrase or paraphrasing in computational linguistics is the natural language processing task of detecting and generating paraphrases. Applications of paraphrasing are varied including information retrieval, question answering, text summarization, and plagiarism detection. Paraphrasing is also useful in the evaluation of machine translation, as well as semantic parsing and generation of new samples to expand existing corpora.

<span class="mw-page-title-main">Sentence embedding</span>

In natural language processing, a sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information.

References

  1. 1 2 3 4 5 6 7 8 MorrisJane; HirstGraeme (1991-03-01). "Lexical cohesion computed by thesaural relations as an indicator of the structure of text". Computational Linguistics.
  2. 1 2 3 Halliday, Michael Alexander Kirkwood (1976). Cohesion in English. Hasan, Ruqaiya. London: Longman. ISBN   0-582-55031-9. OCLC   2323723.
  3. Carrell, Patricia L. (1982). "Cohesion Is Not Coherence". TESOL Quarterly. 16 (4): 479–488. doi:10.2307/3586466. ISSN   0039-8322. JSTOR   3586466.
  4. 1 2 Ruas, Terry; Ferreira, Charles Henrique Porto; Grosky, William; de França, Fabrício Olivetti; de Medeiros, Débora Maria Rossi (2020-09-01). "Enhanced word embeddings using multi-semantic representation through lexical chains". Information Sciences. 532: 16–32. arXiv: 2101.09023 . doi:10.1016/j.ins.2020.04.048. ISSN   0020-0255. S2CID   218954068.
  5. Barzilay, Regina; McKeown, Kathleen R.; Elhadad, Michael (1999). "Information fusion in the context of multi-document summarization". Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. College Park, Maryland: Association for Computational Linguistics: 550–557. doi: 10.3115/1034678.1034760 . ISBN   1558606092.
  6. Silber, Gregory; McCoy, Kathleen (2001). "Efficient text summarization using lexical chains | Proceedings of the 5th international conference on Intelligent user interfaces": 252–255. doi:10.1145/325737.325861. S2CID   8403554.{{cite journal}}: Cite journal requires |journal= (help)
  7. 1 2 "WordNet | A Lexical Database for English". wordnet.princeton.edu. Retrieved 2020-05-20.
  8. WordNet : an electronic lexical database. Fellbaum, Christiane. Cambridge, Mass: MIT Press. 1998. ISBN   0-262-06197-X. OCLC   38104682.{{cite book}}: CS1 maint: others (link)
  9. Budanitsky, Alexander; Hirst, Graeme (2001). "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures" (PDF). Proceedings of the Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2001). pp. 24–29. Retrieved 2020-05-20.{{cite web}}: CS1 maint: location (link)
  10. Budanitsky, Alexander; Hirst, Graeme (2006). "Evaluating WordNet-based Measures of Lexical Semantic Relatedness". Computational Linguistics. 32 (1): 13–47. doi: 10.1162/coli.2006.32.1.13 . ISSN   0891-2017. S2CID   838777.
  11. Jiang, Jay J.; Conrath, David W. (1997-09-20). "Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy". arXiv: cmp-lg/9709008 .
  12. Moldovan, Dan; Novischi, Adrian (2002). "Lexical chains for question answering". Proceedings of the 19th international conference on Computational linguistics -. Vol. 1. Taipei, Taiwan: Association for Computational Linguistics. pp. 1–7. doi: 10.3115/1072228.1072395 .
  13. McCarthy, Diana; Koeling, Rob; Weeds, Julie; Carroll, John (2004). "Finding predominant word senses in untagged text". Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04. Barcelona, Spain: Association for Computational Linguistics: 279–es. doi: 10.3115/1218955.1218991 .
  14. Ercan, Gonenc; Cicekli, Ilyas (2007). "Using lexical chains for keyword extraction". Information Processing & Management. 43 (6): 1705–1714. doi:10.1016/j.ipm.2007.01.015. hdl: 11693/23343 .
  15. Wei, Tingting; Lu, Yonghe; Chang, Huiyou; Zhou, Qiang; Bao, Xianyu (2015). "A semantic approach for text clustering using WordNet and lexical chains". Expert Systems with Applications. 42 (4): 2264–2275. doi: 10.1016/j.eswa.2014.10.023 .
  16. Linguistic Modeling and Knowledge Processing Department, Institute of Information and Communication Technology, Bulgarian Academy of Sciences; Simov, Kiril; Boytcheva, Svetla; Osenova, Petya (2017-11-10). "Towards Lexical Chains for Knowledge-Graph-basedWord Embeddings" (PDF). RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning. Incoma Ltd. Shoumen, Bulgaria: 679–685. doi: 10.26615/978-954-452-049-6_087 . ISBN   978-954-452-049-6. S2CID   41952796.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  17. Rios Gonzales, Annette; Mascarell, Laura; Sennrich, Rico (2017). "Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings". Proceedings of the Second Conference on Machine Translation. Copenhagen, Denmark: Association for Computational Linguistics. pp. 11–19. doi: 10.18653/v1/W17-4702 .
  18. Mascarell, Laura (2017). "Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation". Proceedings of the Third Workshop on Discourse in Machine Translation. Copenhagen, Denmark: Association for Computational Linguistics: 99–109. doi: 10.18653/v1/W17-4813 .