MontyLingua

Last updated April 19, 2021

MontyLingua is a popular natural language processing toolkit. It is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for both the Python and Java programming languages. It is enriched with common sense knowledge about the everyday world from Open Mind Common Sense. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. It does not require training. It was written by Hugo Liu at MIT in 2003.
Because it is enriched with common sense knowledge it can avoid many mistakes. e.g.:

Abilities

MontyTokenizer: normalizes punctuation, spacing and contractions, with sensitivity to abbrevs.
MontyTagger: Part-of-speech tagging using the Penn Treebank tagset, enriched with "Common Sense" from the Open Mind Common Sense project. Exceeds accuracy of Brill94 tbl tagger using default training files
MontyREChunker: chunks tagged text into verb, noun, and adjective chunks (VX, NX, and AX respectively)
MontyExtractor: extracts verb-argument structures, phrases, and other semantically valuable information from sentences and returns sentences as "digests"
MontyLemmatiser: part-of-speech sensitive lemmatisation. Strips plurals (geese-->goose) and tense (were-->be, had-->have). Includes regexps from Humphreys and Carroll's morph.lex, and UPENN's XTAG corpus
MontyNLGenerator: generates summaries, generates surface form sentences, determines and numbers NPs and tenses verbs, accounts for sentence_type

Related Research Articles

A verb, from the Latin verbum meaning word, is a word that in syntax conveys an action, an occurrence, or a state of being. In the usual description of English, the basic form, with or without the particle to, is the infinitive. In many languages, verbs are inflected to encode tense, aspect, mood, and voice. A verb may also agree with the person, gender or number of some of its arguments, such as its subject, or object. Verbs have tenses: present, to indicate that an action is being carried out; past, to indicate that an action has been done; future, to indicate that an action will be done.

In linguistics, an adjective is a word that modifies a noun or noun phrase or describes its referent. Its semantic role is to change information given by the noun.

In traditional grammar, a part of speech or part-of-speech is a category of words that have similar grammatical properties. Words that are assigned to the same part of speech generally display similar syntactic behavior—they play similar roles within the grammatical structure of sentences—and sometimes similar morphology in that they undergo inflection for similar properties.

English grammar is the way in which meanings are encoded into wordings in the English language. This includes the structure of words, phrases, clauses, sentences, and whole texts.

Linguistics is the scientific study of human language. Someone who engages in this study is called a linguist. See also the Outline of linguistics, the List of phonetics topics, the List of linguists, and the List of cognitive science topics. Articles related to linguistics include:

In linguistics, a participle is a nonfinite verb form that has some of the characteristics and functions of both verbs and adjectives. More narrowly, participle has been defined as "a word derived from a verb and used as an adjective, as in a laughing face".

A garden-path sentence is a grammatically correct sentence that starts in such a way that a reader's most likely interpretation will be incorrect; the reader is lured into a parse that turns out to be a dead end or yields a clearly unintended meaning. "Garden path" refers to the saying "to be led down [or up] the garden path", meaning to be deceived, tricked, or seduced. In A Dictionary of Modern English Usage, Fowler describes such sentences as unwittingly laying a "false scent".

Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas Link Grammar makes the head-dependent relationship optional. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words. The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.

In corpus linguistics, part-of-speech tagging, also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Shallow parsing is an analysis of a sentence which first identifies constituent parts of sentences and then links them to higher order units that have discrete grammatical meanings. While the most elementary chunking algorithms simply link constituent parts on the basis of elementary search patterns, approaches that use machine learning techniques can take contextual information into account and thus compose chunks in such a way that they better reflect the semantic relations between the basic constituents. That is, these more advanced methods get around the problem that combinations of elementary constituents can have different higher level meanings depending on the context of the sentence.

A grammatical category or grammatical feature is a property of items within the grammar of a language. Within each category there are two or more possible values, which are normally mutually exclusive. Frequently encountered grammatical categories include:

Agreement or concord happens when a word changes form depending on the other words to which it relates. It is an instance of inflection, and usually involves making the value of some grammatical category "agree" between varied words or parts of the sentence.

The grammar of the Polish language is characterized by a high degree of inflection, and has relatively free word order, although the dominant arrangement is subject–verb–object (SVO). There are no articles, and there is frequent dropping of subject pronouns. Distinctive features include the different treatment of masculine personal nouns in the plural, and the complex grammar of numerals and quantifiers.

The grammar of Macedonian is, in many respects, similar to that of some other Balkan languages, especially Bulgarian. Macedonian exhibits a number of grammatical features that distinguish it from most other Slavic languages, such as the elimination of case declension, the development of a suffixed definite article, and the lack of an infinitival verb, among others.

In linguistics, a resultative is a form that expresses that something or someone has undergone a change in state as the result of the completion of an event. Resultatives appear as predicates of sentences, and are generally composed of a verb, a post-verbal noun phrase and a so-called resultative phrase which may be represented by an adjective, a prepositional phrase, or a particle, among others. For example, in the English sentence The man wiped the table clean, the adjective clean denotes the state achieved by the table as a result of the event described as the man wiped.

In linguistic morphology, inflection is a process of word formation, in which a word is modified to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, mood, animacy, and definiteness. The inflection of verbs is called conjugation, and one can refer to the inflection of nouns, adjectives, adverbs, pronouns, determiners, participles, prepositions and postpositions, numerals, articles etc., as declension.

Kēlen is a constructed language created by Sylvia Sotomayor. It is an attempt to create a truly alien language by violating a key linguistic universal—namely that all human languages have verbs. In Kēlen, relationships between the noun phrases making up the sentence are expressed by one of four relationals. According to Sotomayor, these relationals perform the functions of verbs but lack any of the semantic content. However, the semantic content found in common verbs, such as those that are semantic primes, can also be found in Kēlen's relationals, which calls into question whether Kēlen is technically verbless. Despite its distinctive grammar, Kēlen is an expressive and intelligible language; texts written in Kēlen have been translated into other languages by several people other than the creator of the language. In an interview, Sotomayor states that she aims for Kēlen to be naturalistic apart from its verblessness, and that to achieve this she employs the principle "change one thing and keep everything else the same".

Koine Greek grammar is a subclass of Ancient Greek grammar peculiar to the Koine Greek dialect. It includes many forms of Hellenistic era Greek, and authors such as Plutarch and Lucian, as well as many of the surviving inscriptions and papyri.

Mizo grammar is the grammar of the Mizo language, a Tibeto-Burman language spoken by about a million people in Mizoram, Manipur, Tripura, Burma and Chittagong Hill Tracts of Bangladesh. It is a highly inflected language, with fairly complex noun phrase structure and word modifications. Nouns and pronouns are declined, and phrasal nouns also undergo an analogous declension.

A bare noun is a noun that is used without a surface determiner or quantifier. In natural languages, the distribution of bare nouns is subject to various language-specific constraints. Under the DP hypothesis a noun in an argument position must have a determiner or quantifier that introduces the noun, warranting special treatment of the bare nouns that seemingly contradict this. As a result, bare nouns have attracted extensive study in the fields of both semantics and syntax.

References

↑ "MontyLingua V.2.1 (Python and Java) A Free, Commonsense-Enriched Natural Language Understander for English" . Retrieved 2008-12-30.CS1 maint: discouraged parameter (link)

External links

Official website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "MontyLingua V.2.1 (Python and Java) A Free, Commonsense-Enriched Natural Language Understander for English" . Retrieved 2008-12-30.CS1 maint: discouraged parameter (link)

[1]

MontyLingua

Contents

Abilities

Related Research Articles

References

External links