In linguistics, realization is the process by which some kind of surface representation is derived from its underlying representation; that is, the way in which some abstract object of linguistic analysis comes to be produced in actual language. Phonemes are often said to be realized by speech sounds. The different sounds that can realize a particular phoneme are called its allophones.
Realization is also a subtask of natural language generation, which involves creating an actual text in a human language (English, French, etc.) from a syntactic representation. There are a number of software packages available for realization, most of which have been developed by academic research groups in NLG. The remainder of this article concerns realization of this kind.
For example, the following Java code causes the simplenlg system [1] to print out the text The women do not smoke.:
NPPhraseSpecsubject=nlgFactory.createNounPhrase("the","woman");subject.setPlural(true);SPhraseSpecsentence=nlgFactory.createClause(subject,"smoke");sentence.setFeature(Feature.NEGATED,true);System.out.println(realiser.realiseSentence(sentence));
In this example, the computer program has specified the linguistic constituents of the sentence (verb, subject), and also linguistic features (plural subject, negated), and from this information the realiser has constructed the actual sentence.
Realisation involves three kinds of processing:
Syntactic realisation: Using grammatical knowledge to choose inflections, add function words and also to decide the order of components. For example, in English the subject usually precedes the verb, and the negated form of smoke is do not smoke.
Morphological realisation: Computing inflected forms, for example the plural form of woman is women (not womans).
Orthographic realisation: Dealing with casing, punctuation, and formatting. For example, capitalising The because it is the first word of the sentence.
The above examples are very basic, most realisers are capable of considerably more complex processing.
A number of realisers have been developed over the past 20 years. These systems differ in terms of complexity and sophistication of their processing, robustness in dealing with unusual cases, and whether they are accessed programmatically via an API or whether they take a textual representation of a syntactic structure as their input.
There are also major differences in pragmatic factors such as documentation, support, licensing terms, speed and memory usage, etc.
It is not possible to describe all realisers here, but a few of the emerging areas are:
A morpheme is any of the smallest meaningful constituents within a linguistic expression and particularly within a word. Many words are themselves standalone morphemes, while other words contain multiple morphemes; in linguistic terminology, this is the distinction, respectively, between free and bound morphemes. The field of linguistic study dedicated to morphemes is called morphology.
In linguistics, morphology is the study of words, including the principles by which they are formed, and how they relate to one another within a language. Most approaches to morphology investigate the structure of words in terms of morphemes, which are the smallest units in a language with some independent meaning. Morphemes include roots that can exist as words by themselves, but also categories such as affixes that can only appear as part of a larger word. For example, in English the root catch and the suffix -ing are both morphemes; catch may appear as its own word, or it may be combined with -ing to form the new word catching. Morphology also analyzes how words behave as parts of speech, and how they may be inflected to express grammatical categories including number, tense, and aspect. Concepts such as productivity are concerned with how speakers create words in specific contexts, which evolves over the history of a language.
French grammar is the set of rules by which the French language creates statements, questions and commands. In many respects, it is quite similar to that of the other Romance languages.
In linguistics, agglutination is a morphological process in which words are formed by stringing together morphemes, each of which corresponds to a single syntactic feature. Languages that use agglutination widely are called agglutinative languages. For example, in the agglutinative language of Turkish, the word evlerinizden consists of the morphemes ev-ler-i-n-iz-den. Agglutinative languages are often contrasted with isolating languages, in which words are monomorphemic, and fusional languages, in which words can be complex, but morphemes may correspond to multiple features.
Mbula is an Austronesian language spoken by around 2,500 people on Umboi Island and Sakar Island in the Morobe Province of Papua New Guinea. Its basic word order is subject–verb–object; it has a nominative–accusative case-marking strategy.
Wiyot or Soulatluk is an Algic language spoken by the Wiyot people of Humboldt Bay, California. The language's last native speaker, Della Prince, died in 1962.
In linguistics, a zero or null is a segment which is not pronounced or written. It is a useful concept in analysis, indicating lack of an element where one might be expected. It is usually written with the symbol "∅", in Unicode U+2205∅EMPTY SET. A common ad hoc solution is to use the Scandinavian capital letter Ø instead.
East Flemish is a collective term for the two easternmost subdivisions of the so-called Flemish dialects, native to the southwest of the Dutch language area, which also include West Flemish. Their position between West Flemish and Brabantian has caused East Flemish dialects to be grouped with the latter as well. They are spoken mainly in the province of East Flanders and a narrow strip in the southeast of West Flanders in Belgium and eastern Zeelandic Flanders in the Netherlands. Even though the dialects of the Dender area are often discussed together with the East Flemish dialects because of their location, the latter are actually South Brabantian.
In linguistics, especially within generative grammar, phi features are the morphological expression of a semantic process in which a word or morpheme varies with the form of another word or phrase in the same sentence. This variation can include person, number, gender, and case, as encoded in pronominal agreement with nouns and pronouns. Several other features are included in the set of phi-features, such as the categorical features ±N (nominal) and ±V (verbal), which can be used to describe lexical categories and case features.
In linguistics, a feature is any characteristic used to classify a phoneme or word. These are often binary or unary conditions which act as constraints in various forms of linguistic analysis.
Tuscarora, sometimes called Skarò˙rə̨ˀ, is the Iroquoian language of the Tuscarora people, spoken in southern Ontario, Canada, North Carolina and northwestern New York around Niagara Falls, in the United States before becoming extinct in late 2020. The historic homeland of the Tuscarora was in eastern North Carolina, in and around the Goldsboro, Kinston, and Smithfield areas.
The Yimas language is spoken by the Yimas people, who populate the Sepik River Basin region of Papua New Guinea. It is spoken primarily in Yimas village, Karawari Rural LLG, East Sepik Province. It is a member of the Lower-Sepik language family. All 250-300 speakers of Yimas live in two villages along the lower reaches of the Arafundi River, which stems from a tributary of the Sepik River known as the Karawari River.
Araki is a nearly extinct language spoken in the small island of Araki, south of Espiritu Santo Island in Vanuatu. Araki is gradually being replaced by Tangoa, a language from a neighbouring island.
Wagiman, also spelt Wageman, Wakiman, Wogeman, and other variants, is a near-extinct Aboriginal Australian language spoken by a small number of Wagiman people in and around Pine Creek, in the Katherine Region of the Northern Territory.
The term linguistic performance was used by Noam Chomsky in 1960 to describe "the actual use of language in concrete situations". It is used to describe both the production, sometimes called parole, as well as the comprehension of language. Performance is defined in opposition to "competence"; the latter describes the mental knowledge that a speaker or listener has of language.
In linguistics, aggregation is a subtask of natural language generation, which involves merging syntactic constituents together. Sometimes aggregation can be done at a conceptual level.
Mekeo is a language spoken in Papua New Guinea and had 19,000 speakers in 2003. It is an Oceanic language of the Papuan Tip Linkage. The two major villages that the language is spoken in are located in the Central Province of Papua New Guinea. These are named Ongofoina and Inauaisa. The language is also broken up into four dialects: East Mekeo ; Northwest Mekeo ; West Mekeo and North Mekeo. The standard dialect is East Mekeo. This main dialect is addressed throughout the article. In addition, there are at least two Mekeo-based pidgins.
Ute is a dialect of the Colorado River Numic language, spoken by the Ute people. Speakers primarily live on three reservations: Uintah-Ouray in northeastern Utah, Southern Ute in southwestern Colorado, and Ute Mountain in southwestern Colorado and southeastern Utah. Ute is part of the Numic branch of the Uto-Aztecan language family. Other dialects in this dialect chain are Chemehuevi and Southern Paiute. As of 2010, there were 1,640 speakers combined of all three dialects Colorado River Numic. Ute's parent language, Colorado River Numic, is classified as a threatened language, although there are tribally-sponsored language revitalization programs for the dialect.
Wamesa is an Austronesian language of Indonesian New Guinea, spoken across the neck of the Doberai Peninsula or Bird's Head. There are currently 5,000–8,000 speakers. While it was historically used as a lingua franca, it is currently considered an under-documented, endangered language. This means that fewer and fewer children have an active command of Wamesa. Instead, Papuan Malay has become increasingly dominant in the area.
Ambonese Malay or simply Ambonese is a Malay-based creole language spoken on Ambon Island in the Maluku Islands of Eastern Indonesia. It was first brought by traders from Western Indonesia, then developed when the Dutch Empire colonised the Maluku Islands and was used as a tool by missionaries in Eastern Indonesia. Malay has been taught in schools and churches in Ambon, and because of this it has become a lingua franca in Ambon and its surroundings.