Textual entailment

Last updated April 04, 2024

In natural language processing, textual entailment (TE), also known as natural language inference (NLI), is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text.

Definition

In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively. Textual entailment is not the same as pure logical entailment – it has a more relaxed definition: "t entails h" (t ⇒ h) if, typically, a human reading t would infer that h is most likely true.^[1] (Alternatively: t ⇒ h if and only if, typically, a human reading t would be justified in inferring the proposition expressed by h from the proposition expressed by t.^[2]) The relation is directional because even if "t entails h", the reverse "h entails t" is much less certain.^[3]^[4]

Determining whether this relationship holds is an informal task, one which sometimes overlaps with the formal tasks of formal semantics (satisfying a strict condition will usually imply satisfaction of a less strict conditioned); additionally, textual entailment partially subsumes word entailment.

Examples

Textual entailment can be illustrated with examples of three different relations:^[5]

An example of a positive TE (text entails hypothesis) is:

text: If you help the needy, God will reward you.

hypothesis: Giving money to a poor man has good consequences.

An example of a negative TE (text contradicts hypothesis) is:

text: If you help the needy, God will reward you.

hypothesis: Giving money to a poor man has no consequences.

An example of a non-TE (text does not entail nor contradict) is:

text: If you help the needy, God will reward you.

hypothesis: Giving money to a poor man will make you a better person.

Ambiguity of natural language

A characteristic of natural language is that there are many different ways to state what one wants to say: several meanings can be contained in a single text and the same meaning can be expressed by different texts. This variability of semantic expression can be seen as the dual problem of language ambiguity. Together, they result in a many-to-many mapping between language expressions and meanings. The task of paraphrasing involves recognizing when two texts have the same meaning and creating a similar or shorter text that conveys almost the same information. Textual entailment is similar^[6] but weakens the relationship to be unidirectional. Mathematical solutions to establish textual entailment can be based on the directional property of this relation, by making a comparison between some directional similarities of the texts involved.^[4]

Approaches

Textual entailment measures natural language understanding as it asks for a semantic interpretation of the text, and due to its generality remains an active area of research. Many approaches and refinements of approaches have been considered, such as word embedding, logical models, graphical models, rule systems, contextual focusing, and machine learning.^[6] Practical or large-scale solutions avoid these complex methods and instead use only surface syntax or lexical relationships, but are correspondingly less accurate.^[3]As of 2005^[update], state-of-the-art systems are far from human performance; a study found humans to agree on the dataset 95.25% of the time.^[7] Algorithms from 2016 had not yet achieved 90%.^[8]

Applications

Many natural language processing applications, like question answering, information extraction, summarization, multi-document summarization, and evaluation of machine translation systems, need to recognize that a particular target meaning can be inferred from different text variants. Typically entailment is used as part of a larger system, for example in a prediction system to filter out trivial or obvious predictions.^[9] Textual entailment also has applications in adversarial stylometry, which has the objective of removing textual style without changing the overall meaning of communication.^[10]

Datasets

Some of available English NLI datasets include:

In addition, there are several non-English NLI datasets, as follows:

Related Research Articles

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a natural language.

A paraphrase or rephrase is the rendering of the same text in different words without losing the meaning of the text itself. More often than not, a paraphrased text can convey its meaning better than the original words. In other words, it is a copy of the text in meaning, but which is different from the original. For example, when someone tells a story they heard in their own words, they paraphrase, with the meaning being the same. The term itself is derived via Latin paraphrasis, from Ancient Greek παράφρασις (paráphrasis) 'additional manner of expression'. The act of paraphrasing is also called paraphrasis.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to that of information extraction (IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships.

In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word. Given that the output of word-sense induction is a set of senses for the target word, this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.

In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear approximately equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. The "topics" produced by topic modeling techniques are clusters of similar words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document's balance of topics is.

In natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris". Entity linking is different from named-entity recognition (NER) in that NER identifies the occurrence of a named entity in text but it does not identify which specific entity it is.

LEPOR is an automatic language independent machine translation evaluation metric with tunable parameters and reinforced factors.

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

Native-language identification (NLI) is the task of determining an author's native language (L1) based only on their writings in a second language (L2). NLI works through identifying language-usage patterns that are common to specific L1 groups and then applying this knowledge to predict the native language of previously unseen texts. This is motivated in part by applications in second-language acquisition, language teaching and forensic linguistics, amongst others.

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

Paraphrase or paraphrasing in computational linguistics is the natural language processing task of detecting and generating paraphrases. Applications of paraphrasing are varied including information retrieval, question answering, text summarization, and plagiarism detection. Paraphrasing is also useful in the evaluation of machine translation, as well as semantic parsing and generation of new samples to expand existing corpora.

Bidirectional Encoder Representations from Transformers (BERT) is a language model based on the transformer architecture, notable for its dramatic improvement over previous state of the art models. It was introduced in October 2018 by researchers at Google. A 2020 literature survey concluded that "in a little over a year, BERT has become a ubiquitous baseline in Natural Language Processing (NLP) experiments counting over 150 research publications analyzing and improving the model."

Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception.

Emotion recognition in conversation (ERC) is a sub-field of emotion recognition, that focuses on mining human emotions from conversations or dialogues having two or more interlocutors. The datasets in this field are usually derived from social platforms that allow free and plenty of samples, often containing multimodal data. Self- and inter-personal influences play critical role in identifying some basic emotions, such as, fear, anger, joy, surprise, etc. The more fine grained the emotion labels are the harder it is to detect the correct emotion. ERC poses a number of challenges, such as, conversational-context modeling, speaker-state modeling, presence of sarcasm in conversation, emotion shift across consecutive utterances of the same interlocutor.

Pythia is an ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. It was created by Yannis Assael, Thea Sommerschield, and Jonathan Prag, researchers from Google DeepMind and the University of Oxford.

deepset German natural language processing startup

deepset is an enterprise software vendor that provides developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and maintains the open source software Haystack and its commercial SaaS offering deepset Cloud.

Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop.

References

↑ Ido Dagan, Oren Glickman and Bernardo Magnini. The PASCAL Recognising Textual Entailment Challenge, p. 2 Archived 2012-03-03 at the Wayback Machine in: Quiñonero-Candela, J.; Dagan, I.; Magnini, B.; d'Alché-Buc, F. (Eds.) Machine Learning Challenges. Lecture Notes in Computer Science, Vol. 3944, pp. 177–190, Springer, 2006.
↑ Korman, Daniel Z.; Mack, Eric; Jett, Jacob; Renear, Allen H. (2018-03-09). "Defining textual entailment". Journal of the Association for Information Science and Technology. 69 (6): 763–772. doi:10.1002/asi.24007. ISSN 2330-1635. S2CID 46920779.
1 2 Dagan, I. and O. Glickman. 'Probabilistic textual entailment: Generic applied modeling of language variability' Archived 2012-03-29 at the Wayback Machine in: PASCAL Workshop on Learning Methods for Text Understanding and Mining (2004) Grenoble.
1 2 Tătar, D. e.a. Textual Entailment as a Directional Relation
↑ Textual Entailment Portal on the Association for Computational Linguistics wiki
1 2 Androutsopoulos, Ion; Malakasiotis, Prodromos (2010). "A Survey of Paraphrasing and Textual Entailment Methods" (PDF). Journal of Artificial Intelligence Research. 38: 135–187. arXiv: 0912.3747 . doi:10.1613/jair.2985. S2CID 9234833. Archived from the original (PDF) on 9 December 2017. Retrieved 13 February 2017.
↑ Bos, Johan; Markert, Katja (6–8 October 2005). "Recognising textual entailment with logical inference". In Raymond Mooney; Joyce Chai; et al. (eds.). Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing – HLT '05. Vancouver: Association for Computational Linguistics. pp. 628–635. doi: 10.3115/1220575.1220654 . S2CID 10202504.
↑ Zhao, Kai; Huang, Liang; Ma, Mingbo (4 January 2017). "Textual Entailment with Structured Attentions and Composition". arXiv: 1701.01126 [cs.CL].
↑ Shani, Ayelett (25 October 2013). "How Dr. Kira Radinsky Used Algorithms to Predict Riots in Egypt". Haaretz. Retrieved 13 February 2017.
↑ Potthast, Hagen & Stein 2016, p. 11-12.
↑ Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, Christopher D. (2015). A large annotated corpus for learning natural language inference (PDF). In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. pp. 632–642. doi:10.18653/v1/D15-1075.
↑ Williams, Adina; Nangia, Nikita; Bowman, Samuel R. (2018). A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference (PDF). In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics. pp. 1112–1122. doi:10.18653/v1/N18-1101.
↑ Khot, Tushar; Sabharwal, Ashish; Clark, Peter (2018). "SciTaiL: A Textual Entailment Dataset from Science Question Answering". Proceedings of the AAAI Conference on Artificial Intelligence. 32 (1). doi: 10.1609/aaai.v32i1.12022 .
↑ Marelli, Marco; Bentivogli, Luisa; Baroni, Marco; Bernardi, Raffaella; Menini, Stefano; Zamparelli, Roberto (2014). SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment (PDF). In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland: Association for Computational Linguistics. pp. 1–8. doi:10.3115/v1/S14-2001.
↑ Romanov, Alexey; Shivade, Chaitanya (2018). Lessons from Natural Language Inference in the Clinical Domain (PDF). In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics. pp. 1586–1596. doi:10.18653/v1/D18-1187.
↑ Demszky, Dorottya; Guu, Kelvin; Liang, Percy (2018). "Transforming Question Answering Datasets Into Natural Language Inference Datasets". arXiv: 1809.02922 [cs.CL].
↑ Conneau, Alexis; Rinott, Ruty; Lample, Guillaume; Williams, Adina; Bowman, Samuel R.; Schwenk, Holger; Stoyanov, Veselin (2018). XNLI: Evaluating Cross-lingual Sentence Representations (PDF). In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics. pp. 2475–2485. doi:10.18653/v1/D18-1269.
↑ Amirkhani, Hossein; AzariJafari, Mohammad; Faridan-Jahromi, Soroush; Kouhkan, Zeinab; Pourjafari, Zohreh; Amirak, Azadeh (2023-07-07). "FarsTail: a Persian natural language inference dataset". Soft Computing. arXiv: 2009.08820 . doi:10.1007/s00500-023-08959-3. ISSN 1433-7479. S2CID 221802461.
↑ Hu, Hai; Richardson, Kyle; Xu, Liang; Li, Lu; Kübler, Sandra; Moss, Lawrence (2020). OCNLI: Original Chinese Natural Language Inference (PDF). In Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 3512–3526. doi:10.18653/v1/2020.findings-emnlp.314.
↑ Wijnholds, Gijs; Moortgat, Michael (2021). SICK-NL: A Dataset for Dutch Natural Language Inference (PDF). In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics. pp. 1474–1479. doi:10.18653/v1/2021.eacl-main.126.
↑ Mahendra, Rahmad; Aji, Alham Fikri; Louvan, Samuel; Rahman, Fahrurrozi; Vania, Clara (2021). IndoNLI: A Natural Language Inference Dataset for Indonesian (PDF). In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 10511–10527. doi:10.18653/v1/2021.emnlp-main.821.

Bibliography

Potthast, Martin; Hagen, Matthias; Stein, Benno (2016). Author Obfuscation: Attacking the State of the Art in Authorship Verification (PDF). Conference and Labs of the Evaluation Forum.

External links

Textual Entailment Resource Pool

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Ido Dagan, Oren Glickman and Bernardo Magnini. The PASCAL Recognising Textual Entailment Challenge, p. 2 Archived 2012-03-03 at the Wayback Machine in: Quiñonero-Candela, J.; Dagan, I.; Magnini, B.; d'Alché-Buc, F. (Eds.) Machine Learning Challenges. Lecture Notes in Computer Science, Vol. 3944, pp. 177–190, Springer, 2006.

[2] Korman, Daniel Z.; Mack, Eric; Jett, Jacob; Renear, Allen H. (2018-03-09). "Defining textual entailment". Journal of the Association for Information Science and Technology. 69 (6): 763–772. doi:10.1002/asi.24007. ISSN 2330-1635. S2CID 46920779.

[daga2004-3] 1 2 Dagan, I. and O. Glickman. 'Probabilistic textual entailment: Generic applied modeling of language variability' Archived 2012-03-29 at the Wayback Machine in: PASCAL Workshop on Learning Methods for Text Understanding and Mining (2004) Grenoble.

[tata-4] 1 2 Tătar, D. e.a. Textual Entailment as a Directional Relation

[5] Textual Entailment Portal on the Association for Computational Linguistics wiki

[Ion2009-6] 1 2 Androutsopoulos, Ion; Malakasiotis, Prodromos (2010). "A Survey of Paraphrasing and Textual Entailment Methods" (PDF). Journal of Artificial Intelligence Research. 38: 135–187. arXiv: 0912.3747 . doi:10.1613/jair.2985. S2CID 9234833. Archived from the original (PDF) on 9 December 2017. Retrieved 13 February 2017.

[7] Bos, Johan; Markert, Katja (6–8 October 2005). "Recognising textual entailment with logical inference". In Raymond Mooney; Joyce Chai; et al. (eds.). Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing – HLT '05. Vancouver: Association for Computational Linguistics. pp. 628–635. doi: 10.3115/1220575.1220654 . S2CID 10202504.

[8] Zhao, Kai; Huang, Liang; Ma, Mingbo (4 January 2017). "Textual Entailment with Structured Attentions and Composition". arXiv: 1701.01126 [cs.CL].

[9] Shani, Ayelett (25 October 2013). "How Dr. Kira Radinsky Used Algorithms to Predict Riots in Egypt". Haaretz. Retrieved 13 February 2017.

[FOOTNOTEPotthastHagenStein201611-12-10] Potthast, Hagen & Stein 2016, p. 11-12.

[Bowman2015-11] Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, Christopher D. (2015). A large annotated corpus for learning natural language inference (PDF). In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. pp. 632–642. doi:10.18653/v1/D15-1075.

[Williams2018-12] Williams, Adina; Nangia, Nikita; Bowman, Samuel R. (2018). A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference (PDF). In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics. pp. 1112–1122. doi:10.18653/v1/N18-1101.

[Khot2018-13] Khot, Tushar; Sabharwal, Ashish; Clark, Peter (2018). "SciTaiL: A Textual Entailment Dataset from Science Question Answering". Proceedings of the AAAI Conference on Artificial Intelligence. 32 (1). doi: 10.1609/aaai.v32i1.12022 .

[Marelli2014-14] Marelli, Marco; Bentivogli, Luisa; Baroni, Marco; Bernardi, Raffaella; Menini, Stefano; Zamparelli, Roberto (2014). SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment (PDF). In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland: Association for Computational Linguistics. pp. 1–8. doi:10.3115/v1/S14-2001.

[Romanov2018-15] Romanov, Alexey; Shivade, Chaitanya (2018). Lessons from Natural Language Inference in the Clinical Domain (PDF). In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics. pp. 1586–1596. doi:10.18653/v1/D18-1187.

[Demszky2018-16] Demszky, Dorottya; Guu, Kelvin; Liang, Percy (2018). "Transforming Question Answering Datasets Into Natural Language Inference Datasets". arXiv: 1809.02922 [cs.CL].

[Conneau2018-17] Conneau, Alexis; Rinott, Ruty; Lample, Guillaume; Williams, Adina; Bowman, Samuel R.; Schwenk, Holger; Stoyanov, Veselin (2018). XNLI: Evaluating Cross-lingual Sentence Representations (PDF). In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics. pp. 2475–2485. doi:10.18653/v1/D18-1269.

[Amirkhani2021-18] Amirkhani, Hossein; AzariJafari, Mohammad; Faridan-Jahromi, Soroush; Kouhkan, Zeinab; Pourjafari, Zohreh; Amirak, Azadeh (2023-07-07). "FarsTail: a Persian natural language inference dataset". Soft Computing. arXiv: 2009.08820 . doi:10.1007/s00500-023-08959-3. ISSN 1433-7479. S2CID 221802461.

[Hu2020-19] Hu, Hai; Richardson, Kyle; Xu, Liang; Li, Lu; Kübler, Sandra; Moss, Lawrence (2020). OCNLI: Original Chinese Natural Language Inference (PDF). In Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 3512–3526. doi:10.18653/v1/2020.findings-emnlp.314.

[Wijnholds2021-20] Wijnholds, Gijs; Moortgat, Michael (2021). SICK-NL: A Dataset for Dutch Natural Language Inference (PDF). In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics. pp. 1474–1479. doi:10.18653/v1/2021.eacl-main.126.

[Mahendra2021-21] Mahendra, Rahmad; Aji, Alham Fikri; Louvan, Samuel; Rahman, Fahrurrozi; Vania, Clara (2021). IndoNLI: A Natural Language Inference Dataset for Indonesian (PDF). In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 10511–10527. doi:10.18653/v1/2021.emnlp-main.821.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]