Sequence labeling

Last updated

In machine learning, sequence labeling is a type of pattern recognition task that involves the algorithmic assignment of a categorical label to each member of a sequence of observed values. A common example of a sequence labeling task is part of speech tagging, which seeks to assign a part of speech to each word in an input sentence or document. Sequence labeling can be treated as a set of independent classification tasks, one per member of the sequence. However, accuracy is generally improved by making the optimal label for a given element dependent on the choices of nearby elements, using special algorithms to choose the globally best set of labels for the entire sequence at once.

Contents

As an example of why finding the globally best label sequence might produce better results than labeling one item at a time, consider the part-of-speech tagging task just described. Frequently, many words are members of multiple parts of speech, and the correct label of such a word can often be deduced from the correct label of the word to the immediate left or right. For example, the word "sets" can be either a noun or verb. In a phrase like "he sets the books down", the word "he" is unambiguously a pronoun, and "the" unambiguously a determiner, and using either of these labels, "sets" can be deduced to be a verb, since nouns very rarely follow pronouns and are less likely to precede determiners than verbs are. But in other cases, only one of the adjacent words is similarly helpful. In "he sets and then knocks over the table", only the word "he" to the left is helpful (cf. "...picks up the sets and then knocks over..."). Conversely, in "... and also sets the table" only the word "the" to the right is helpful (cf. "... and also sets of books were ..."). An algorithm that proceeds from left to right, labeling one word at a time, can only use the tags of left-adjacent words and might fail in the second example above; vice versa for an algorithm that proceeds from right to left.

Most sequence labeling algorithms are probabilistic in nature, relying on statistical inference to find the best sequence. The most common statistical models in use for sequence labeling make a Markov assumption, i.e. that the choice of label for a particular word is directly dependent only on the immediately adjacent labels; hence the set of labels forms a Markov chain. This leads naturally to the hidden Markov model (HMM), one of the most common statistical models used for sequence labeling. Other common models in use are the maximum entropy Markov model and conditional random field.

See also

Related Research Articles

Natural language processing Field of computer science and linguistics

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The result is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields.

Adjective Part of speech that describes a noun or pronoun

In linguistics, an adjective is a word that modifies a noun or noun phrase or describes its referent. Its semantic role is to change information given by the noun.

Phrase Group of words

In everyday speech, a phrase is any group of words, often carrying a special idiomatic meaning; in this sense it is synonymous with expression. In linguistic analysis, a phrase is a group of words that functions as a constituent in the syntax of a sentence, a single unit within a grammatical hierarchy. A phrase typically appears within a clause, but it is possible also for a phrase to be a clause or to contain a clause within it. There are also types of phrases like noun phrase and prepositional phrase.

In traditional grammar, a part of speech or part-of-speech is a category of words that have similar grammatical properties. Words that are assigned to the same part of speech generally display similar syntactic behavior—they play similar roles within the grammatical structure of sentences—and sometimes similar morphology in that they undergo inflection for similar properties.

Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process – call it  – with unobservable ("hidden") states. HMM assumes that there is another process whose behavior "depends" on . The goal is to learn about by observing . HMM stipulates that, for each time instance , the conditional probability distribution of given the history must not depend on .

Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power. However, these activities can be viewed as two facets of the same field of application, and together they have undergone substantial development over the past few decades. A modern definition of pattern recognition is:

The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.

The plural, in many languages, is one of the values of the grammatical category of number. Plurals of nouns typically denote a quantity other than the default quantity represented by a noun, which is generally one. Most commonly, therefore, plurals are used to denote two or more of something, although they may also denote more than fractional, zero or negative amounts. An example of a plural is the English word cats, which corresponds to the singular cat.

In linguistics, a determiner phrase (DP) is a type of phrase posited by some theories of syntax. The head of a DP is a determiner, as opposed to a noun. For example in the phrase the car, the is a determiner and car is a noun; the two combine to form a phrase, and on the DP-analysis, the determiner the is head over the noun car. The existence of DPs is a controversial issue in the study of syntax. The traditional analysis of phrases such as the car is that the noun is the head, which means the phrase is a noun phrase (NP), not a determiner phrase. Beginning in the mid 1980s, an alternative analysis arose that posits the determiner as the head, which makes the phrase a DP instead of an NP.

Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas Link Grammar makes the head-dependent relationship optional. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words. The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.

In corpus linguistics, part-of-speech tagging, also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

The Brill tagger is an inductive method for part-of-speech tagging. It was described and invented by Eric Brill in his 1993 PhD thesis. It can be summarized as an "error-driven transformation-based tagger". It is:

Conditional random fields (CRFs) are a class of statistical modeling method often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighboring" samples, a CRF can take context into account. To do so, the prediction is modeled as a graphical model, which implements dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, linear chain CRFs are popular, which implement sequential dependencies in the predictions. In image processing the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

This article describes the grammar of the Scottish Gaelic language.

Jemez language

Jemez is a Tanoan language spoken by the Jemez Pueblo people in New Mexico. It has no common written form, as tribal rules do not allow the language to be transcribed; linguists describing the language use the Americanist phonetic notation.

The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96-97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus.

Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.

In statistics, a maximum-entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. An MEMM is a discriminative model that extends a standard maximum entropy classifier by assuming that the unknown values to be learnt are connected in a Markov chain rather than being conditionally independent of each other. MEMMs find applications in natural language processing, specifically in part-of-speech tagging and information extraction.

English determiners

An important role in English grammar is played by determiners – words or phrases that precede a noun or noun phrase and serve to express its reference in the context. The most common of these are the definite and indefinite articles, the and a(n). Other determiners in English include demonstratives such as this and that, possessives such as my and the boy's, and quantifiers such as all, many, and three.

Wandamen language

Wamesa is an Austronesian language of Indonesian New Guinea, spoken across the neck of the Doberai Peninsula or Bird's Head. The language is often called Wandamen in the literature; however, several speakers of the Windesi dialect have stated that 'Wandamen' and 'Wondama' refer to a dialect spoken around the Wondama Bay, studied by early missionaries and linguists from SIL. They affirm that the language as a whole is called 'Wamesa', the dialects of which are Windesi, Bintuni, and Wandamen. While Wamesa is spoken in West Papua, Wamesa is not a Papuan language but rather a SHWNG language.

References

    Further reading