Lexical simplification

Last updated February 27, 2020

Lexical simplification is a sub-task of text simplification. It can be defined as any lexical substitution task that reduce text complexity.

Related Research Articles

Emacs Lisp dialect of Lisp used in GNU Emacs

Emacs Lisp is a dialect of the Lisp programming language used as a scripting language by Emacs. It is used for implementing most of the editing functionality built into Emacs, the remainder being written in C, as is the Lisp interpreter. Emacs Lisp is also termed Elisp, although there is also an older, unrelated Lisp dialect with that name.

In traditional grammar, a part of speech is a category of words that have similar grammatical properties. Words that are assigned to the same part of speech generally display similar syntactic behavior—they play similar roles within the grammatical structure of sentences—and sometimes similar morphology in that they undergo inflection for similar properties.

In computer programming, the scope of a name binding—an association of a name to an entity, such as a variable—is the region of a computer program where the binding is valid: where the name can be used to refer to the entity. Such a region is referred to as a scope block. In other parts of the program the name may refer to a different entity, or to nothing at all.

In computational linguistics, word-sense disambiguation (WSD) is an open problem concerned with identifying which sense of a word is used in a sentence. The solution to this problem impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters into a sequence of tokens. A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

Context may refer to:

In mathematics, a change of variables is a basic technique used to simplify problems in which the original variables are replaced with functions of other variables. The intent is that when expressed in new variables, the problem may become simpler, or equivalent to a better understood problem.

In linguistics, a word sense is one of the meanings of a word. Words are in two sets: a large set with multiple meanings and a small set with only one meaning. For example, a dictionary may have over 50 different senses of the word "play", each of these having a different meaning based on the context of the word's usage in a sentence, as follows:

We went to see the playRomeo and Juliet at the theater.

The coach devised a great play that put the visiting team on the defensive.

The children went out to play in the park.

Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain large vocabularies and complex compound constructions that are not easily processed through automation. In terms of reducing language diversity, semantic compression can be employed to limit and simplify a set of words used in given texts.

Cohesion is the grammatical and lexical linking within a text or sentence that holds a text together and gives it meaning. It is related to the broader concept of coherence.

Semantic analytics, also termed semantic relatedness, is the use of ontologies to analyze content in web resources. This field of research combines text analytics and Semantic Web technologies like RDF. Semantic analytics measures the relatedness of different ontological concepts.

CELPE-Bras is the only certificate of proficiency in Brazilian Portuguese as a second language officially recognized and developed by the Brazilian Ministry of Education. The Celpe-Bras exam is offered in Brazil and many other countries, such as the United States, Germany, Chile, Colombia and Japan, with the support of the Brazilian Ministry of International Relations.

Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the match, replace any remaining fluid deficit to prevent chronic dehydration throughout the tournament", a substitute of game might be given.

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

In natural language processing, semantic compression is a process of compacting a lexicon used to build a textual document by reducing language heterogeneity, while maintaining text semantics. As a result, the same ideas can be represented using a smaller set of words.

PEBL is an open source software program that allows researchers to design and run psychological experiments. It runs on PCs using Windows, OS X, and Linux, using the cross-platform Simple DirectMedia Library (libSDL). It was first released in 2003.

Bilingual lexical access is an area in psycholinguistics research that studies the activation or retrieval process of the mental lexicon for people who can speak two languages.

<i>Language Contact and Lexical Enrichment in Israeli Hebrew</i>

Language Contact and Lexical Enrichment in Israeli Hebrew is a scholarly book written by linguist Ghil'ad Zuckermann, published in 2003 by Palgrave Macmillan. The book proposes a socio-philological framework for the analysis of "camouflaged borrowing" such as phono-semantic matching. It introduces for the first time a classification for "multisourced neologisms", new words that are based on two or more sources at the same time.

Scott Andrew Crossley is an American linguist. He is a professor of applied linguistics at the Georgia State University, United States. His research focuses on natural language processing and the application of computational tools and machine learning algorithms in learning analytics including second language acquisition, second language writing, and readability. His main interest area is the development and use of natural language processing tools in assessing writing quality and text difficulty.

References

Advaith Siddharthan. "Syntactic Simplification and Text Cohesion". In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands.
Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June.

External links

This computational linguistics-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Lexical simplification

Contents

See also

Related Research Articles

References

External links