Semantic lexicon

Last updated
A visual representation of a Semantic Lexicon Hierarchical Model Mental Lexicon.png
A visual representation of a Semantic Lexicon

A semantic lexicon is a digital dictionary of words labeled with semantic classes so associations can be drawn between words that have not previously been encountered. [1] Semantic lexicons are built upon semantic networks, which represent the semantic relations between words. The difference between a semantic lexicon and a semantic network is that a semantic lexicon has definitions for each word, or a "gloss". [2]

Contents

Structure

Semantic lexicons are made up of lexical entries. These entries are not orthographic, but semantic, eliminating issues of homonymy and polysemy. These lexical entries are interconnected with semantic relations, such as hyperonymy, hyponymy, meronymy, or troponymy. Synonymous entries are grouped together in what the Princeton WordNet calls "synsets" [2] Most semantic lexicons are made up of four different "sub-nets": [2] nouns, verbs, adjectives, and adverbs, though some researchers have taken steps to add an "artificial node" interconnecting the sub-nets. [3]

Nouns

Nouns are ordered into a taxonomy, structured into a hierarchy where the broadest and most encompassing noun is located at the top, such as "thing", with the nouns becoming more and more specific the further they are from the top. The very top noun in a semantic lexicon is called a unique beginner. [4] The most specific nouns (those that do not have any subordinates), are terminal nodes. [3]

Semantic lexicons also distinguish between types, where a type of something has characteristics of a thing such as a Rhodesian Ridgeback being a type of dog, and instances, where something is an example of said thing, such as Dave Grohl is an instance of a musician. Instances are always terminal nodes because they are solitary and don’t have other words or ontological categories belonging to them. [2]

Semantic lexicons also address meronymy, [5] which is a “part-to-whole” relationship, such as keys are part of a laptop. The necessary attributes that define a specific entry are also necessarily present in that entry’s hyponym. So, if a computer has keys, and a laptop is a type of computer, then a laptop must have keys. However, there are many instances where this distinction can become vague. A good example of this is the item chair. Most would define a chair as having legs and a seat (as in the part one sits on). However, there are some artistic or modern chairs that do not have legs at all. Beanbags also do not have legs, but few would argue that they aren't chairs. Questions like this are the core questions that drive research and work in the fields of taxonomy and ontology.

Verbs

Verb synsets are arranged much like their noun counterparts: the more general and encompassing verbs are near the top of the hierarchy while troponyms (verbs that describe a more specific way of doing something) are grouped beneath. Verb specificity moves along a vector, with the verbs becoming more and more specific in reference to a certain quality. [2] For example. The set "walk / run / sprint" becomes more specific in terms of the speed, and "dislike / hate / abhor" becomes more specific in terms of the intensity of the emotion.

The ontological groupings and separations of verbs is far more arguable than their noun counterparts. It is widely accepted that a dog is a type of animal and that a stool is a type of chair, but it can be argued that abhor is on the same emotional plane as hate (that they are synonyms and not super/subordinates). It can also be argued that love and adore are synonyms, or that one is more specific than the other. Thus, the relations between verbs are not as agreed-upon as that of nouns.

Another attribute of verb synset relations is that there are also ordered into verb pairs. In these pairs, one verb necessarily entails the other in the way that massacre entails kill, and know entails believe. [2] These verb pairs can be troponyms and their superordinates, as is the case in the first example, or they can be in completely different ontological categories, as in the case in the second example.

Adjectives

Adjective synset relations are very similar to verb synset relations. They are not quite as neatly hierarchical as the noun synset relations, and they have fewer tiers and more terminal nodes. However, there are generally less terminal nodes per ontological category in adjective synset relations than that of verbs. Adjectives in semantic lexicons are organized in word pairs as well, with the difference being that their word pairs are antonyms instead of entailments. More generic polar adjectives such as hot and cold, or happy and sad are paired. Then other adjectives that are semantically similar are linked to each of these words. Hot is linked to warm, heated, sizzling, and sweltering, while cold is linked to cool, chilly, freezing, and nippy. These semantically similar adjectives are considered indirect antonyms [2] to the opposite polar adjective (i.e. nippy is an indirect antonym to hot). Adjectives that are derived from a verb or a noun are also directly linked to said verb or noun across sub-nets. For example, enjoyable is linked to the semantically similar adjectives agreeable, and pleasant, as well as to its origin verb, enjoy.

Adverbs

There are very few adverbs accounted for in semantic lexicons. This is because most adverbs are taken directly from their adjective counterparts, in both meaning and form, and changed only morphologically (i.e. happily is derived from happy, and luckily is derived from lucky, which is derived from luck). The only adverbs that are accounted for specifically are ones without these connections, such as really, mostly, and hardly. [2]

Challenges facing semantic lexicons

The effects of the Princeton WordNet project extend far past English, though most research in the field revolves around the English language. Creating a semantic lexicon for other languages has proved to be very useful for Natural Language Processing applications. One of the main focuses of research in semantic lexicons is linking lexicons of different languages to aid in machine translation. The most common approach is to attempt to create a shared ontology that serves as a “middleman” of sorts between semantic lexicons of two different languages. [6] This is an extremely challenging and as-of-yet unsolved issue in the Machine Translation field. One issue arises from the fact that no two languages are word-for-word translations of each other. That is, every language has some sort of structural or syntactic difference from every other. In addition, languages often have words that don’t translate easily into other languages, and certainly not with an exact word-to-word match. Proposals have been made to create a set framework for wordnets. Research has shown that every known human language has some sort of concept resembling synonymy, hyponymy, meronymy, and antonymy. However, every idea so far proposed has been met with criticism for using a pattern that works best for English and less for other languages. [6]

Another obstacle in the field is that no solid guidelines exist for semantic lexicon framework and contents. Each lexicon project in each different language has had a slightly (or not so slightly) different approach to their wordnet. There is not even an agreed-upon definition of what a “word” is. Orthographically, they are defined as a string of letters with spaces on either side, but semantically it becomes a very debated subject. For example, though it is not difficult to define dog or rod as words, but what about guard dog or lightning rod? The latter two examples would be considered orthographically separate words, though semantically they make up one concept: one is a type of dog and one is a type of rod. In addition to these confusions, wordnets are also idiosyncratic, in that they do not consistently label items. They are redundant, in that they often have several words assigned to each meaning (synsets). They are also open-ended, in that they often focus on and extend into terminology and domain-specific vocabulary. [6]

Other names

List of semantic lexicons

See also

Related Research Articles

In linguistics, declension is the changing of the form of a word, generally to express its syntactic function in the sentence, by way of some inflection. Declensions may apply to nouns, pronouns, adjectives, adverbs, and determiners to indicate number, case, gender, and a number of other grammatical categories. Meanwhile, the inflectional change of verbs is called conjugation.

<span class="mw-page-title-main">WordNet</span> Computational lexicon of English

WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.

An adjective is a word that describes or defines a noun or noun phrase. Its semantic role is to change information given by the noun.

An adverb is a word or an expression that generally modifies a verb, an adjective, another adverb, a determiner, a clause, a preposition, or a sentence. Adverbs typically express manner, place, time, frequency, degree, or level of certainty by answering questions such as how, in what way, when, where, to what extent. This is called the adverbial function and may be performed by an individual adverb, by an adverbial phrase, or by an adverbial clause.

In grammar, a part of speech or part-of-speech is a category of words that have similar grammatical properties. Words that are assigned to the same part of speech generally display similar syntactic behavior, sometimes similar morphological behavior in that they undergo inflection for similar properties and even similar semantic behavior. Commonly listed English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, numeral, article, and determiner.

<span class="mw-page-title-main">Synonym</span> Words or phrases of the same meaning

A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words begin, start, commence, and initiate are all synonyms of one another: they are synonymous. The standard test for synonymy is substitution: one form can be replaced by another in a sentence without changing its meaning.

In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is male entails that it is not female. It is referred to as a 'binary' relationship because there are two members in a set of opposites. The relationship between opposites is known as opposition. A member of a pair of opposites can generally be determined by the question What is the opposite of  X ?

Linguistics is the scientific study of human language. Someone who engages in this study is called a linguist. See also the Outline of linguistics, the List of phonetics topics, the List of linguists, and the List of cognitive science topics. Articles related to linguistics include:

<span class="mw-page-title-main">Hypernymy and hyponymy</span> Semantic relations involving the type-of property

Hypernymy and hyponymy are the semantic relations between a generic term (hypernym) and a specific instance of it (hyponym). The hypernym is also called a supertype, umbrella term, or blanket term. The hyponym is a subtype of the hypernym. The semantic field of the hyponym is included within that of the hypernym. For example, pigeon, crow, and hen are all hyponyms of bird and animal; bird and animal are both hypernyms of pigeon, crow, and hen.

Lexical semantics, as a subfield of linguistic semantics, is the study of word meanings. It includes the study of how words structure their meaning, how they act in grammar and compositionality, and the relationships between the distinct senses and uses of a word.

In linguistics, a modifier is an optional element in phrase structure or clause structure which modifies the meaning of another element in the structure. For instance, the adjective "red" acts as a modifier in the noun phrase "red ball", providing extra details about which particular ball is being referred to. Similarly, the adverb "quickly" acts as a modifier in the verb phrase "run quickly". Modification can be considered a high-level domain of the functions of language, on par with predication and reference.

<span class="mw-page-title-main">Inflection</span> Process of word formation

In linguistic morphology, inflection is a process of word formation in which a word is modified to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, mood, animacy, and definiteness. The inflection of verbs is called conjugation, while the inflection of nouns, adjectives, adverbs, etc. can be called declension.

Rule-based machine translation is machine translation systems based on linguistic information about source and target languages basically retrieved from dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language respectively. Having input sentences, an RBMT system generates them to output sentences on the basis of morphological, syntactic, and semantic analysis of both the source and the target languages involved in a concrete translation task. RBMT has been progressively superseded by more efficient methods, particularly neural machine translation.

GermaNet is a semantic network for the German language. It relates nouns, verbs, and adjectives semantically by grouping lexical units that express the same concept into synsets and by defining semantic relations between these synsets. GermaNet is free for academic use, after signing a license. GermaNet has much in common with the English WordNet and can be viewed as an on-line thesaurus or a light-weight ontology. GermaNet has been developed and maintained at the University of Tübingen since 1997 within the research group for General and Computational Linguistics. It has been integrated into the EuroWordNet, a multilingual lexical-semantic database.

Syntactic bootstrapping is a theory in developmental psycholinguistics and language acquisition which proposes that children learn word meanings by recognizing syntactic categories and the structure of their language. It is proposed that children have innate knowledge of the links between syntactic and semantic categories and can use these observations to make inferences about word meaning. Learning words in one's native language can be challenging because the extralinguistic context of use does not give specific enough information about word meanings. Therefore, in addition to extralinguistic cues, conclusions about syntactic categories are made which then lead to inferences about a word's meaning. This theory aims to explain the acquisition of lexical categories such as verbs, nouns, etc. and functional categories such as case markers, determiners, etc.

<span class="mw-page-title-main">English adjectives</span> Adjectives in the English language

English adjectives form a large open category of words in English which, semantically, tend to denote properties such as size, colour, mood, quality, age, etc. with such members as other, big, new, good, different, Cuban, sure, important, and right. Adjectives head adjective phrases, and the most typical members function as modifiers in noun phrases. Most adjectives either inflect for grade or combine with more and most to form comparatives and superlatives. They are characteristically modifiable by very. A large number of the most typical members combine with the suffix -ly to form adverbs. Most adjectives function as complements in verb phrases, and some license complements of their own.

plWordNet is a lexico-semantic database of the Polish language. It includes sets of synonymous lexical units (synsets) followed by short definitions. plWordNet serves as a thesaurus-dictionary where concepts (synsets) and individual word meanings are defined by their location in the network of mutual relations, reflecting the lexico-semantic system of the Polish language. plWordNet is also used as one of the basic resources for the construction of natural language processing tools for Polish.

The Bulgarian WordNet (BulNet) is an electronic multilingual dictionary of synonym sets along with their explanatory definitions and sets of semantic relations with other words in the language.

<span class="mw-page-title-main">Malayalam WordNet</span>

Malayalam WordNet (പദശൃംഖല) is an online WordNet created for Malayalam Language. Malayalam WordNet has been developed by the Department of Computer Science, Cochin University Of Science And Technology.

<span class="mw-page-title-main">Arabic Ontology</span> Linguistic ontology

Arabic Ontology is a linguistic ontology for the Arabic language, which can be used as an Arabic WordNet with ontologically clean content. People use it also as a tree of the concepts/meanings of the Arabic terms. It is a formal representation of the concepts that the Arabic terms convey, and its content is ontologically well-founded, and benchmarked to scientific advances and rigorous knowledge sources rather than to speakers' naïve beliefs as wordnets typically do . The Ontology tree can be explored online.

References

  1. Theng, Yin-Leng (2009). Handbook of Research on Digital Libraries: Design, Development, and Impact. University of Michigan: Information Science Reference. ISBN   9781599048796.
  2. 1 2 3 4 5 6 7 8 "About WordNet".
  3. 1 2 Lemnitzer, L. "Enriching GermaNet: a case study of lexical acquisition". Seminar für Sprachwissenschaft, Universitat Tubingen.
  4. Boyd-Graber, J. (2006). "Adding Dense, Weighted Connections to WordNet". Proceedings of the Third International Wordnet Conference.
  5. Hinrichs, E. (December 2012). "Using part-whole relations for automatic deduction of compound-international relations in GermaNet". International Journal on Semantic Web and Information Systems . 3.
  6. 1 2 3 Fellbaum, C. (May 2012). "Challenges for a Multilingual Wordnet". Language Resources and Evaluation. 46 (2): 313–326. doi:10.1007/s10579-012-9186-z. S2CID   254379442.