In linguistics, a word sense is one of the meanings of a word. For example, a dictionary may have over 50 different senses of the word "play", each of these having a different meaning based on the context of the word's usage in a sentence, as follows:
We went to see the playRomeo and Juliet at the theater.
The coach devised a great play that put the visiting team on the defensive.
The children went out to play in the park.
In each sentence different collocates of "play" signal its different meanings.
People and computers, as they read words, must use a process called word-sense disambiguation [1] [2] to reconstruct the likely intended meaning of a word. This process uses context to narrow the possible senses down to the probable ones. The context includes such things as the ideas conveyed by adjacent words and nearby phrases, the known or probable purpose and register of the conversation or document, and the orientation (time and place) implied or expressed. The disambiguation is thus context-sensitive.
Advanced semantic analysis has resulted in a sub-distinction. A word sense corresponds either neatly to a seme (the smallest possible unit of meaning) or a sememe (larger unit of meaning), and polysemy of a word of phrase is the property of having multiple semes or sememes and thus multiple senses.
Often the senses of a word are related to each other within a semantic field. A common pattern is that one sense is broader and another narrower. This is often the case in technical jargon, where the target audience uses a narrower sense of a word that a general audience would tend to take in its broader sense. For example, in casual use "orthography" will often be glossed for a lay audience as "spelling", but in linguistic usage "orthography" (comprising spelling, casing, spacing, hyphenation, and other punctuation) is a hypernym of "spelling". Besides jargon, however, the pattern is common even in general vocabulary. Examples are the variation in senses of the term "wood wool" and in those of the word "bean". This pattern entails that natural language can often lack explicitness about hyponymy and hypernymy. Much more than programming languages do, it relies on context instead of explicitness; meaning is implicit within a context. Common examples are as follows:
Usage labels of " sensu " plus a qualifier, such as "sensu stricto" ("in the strict sense") or "sensu lato" ("in the broad sense") are sometimes used to clarify what is meant by a text.
Polysemy entails a common historic root to a word or phrase. Broad medical terms usually followed by qualifiers, such as those in relation to certain conditions or types of anatomical locations are polysemic, and older conceptual words are with few exceptions highly polysemic (and usually beyond shades of similar meaning into the realms of being ambiguous).
Homonymy is where two separate-root words (lexemes) happen to have the same spelling and pronunciation.
A definition is a statement of the meaning of a term. Definitions can be classified into two large categories: intensional definitions, and extensional definitions. Another important category of definitions is the class of ostensive definitions, which convey the meaning of a term by pointing out examples. A term may have many different senses and multiple meanings, and thus require multiple definitions.
Lexicology is the branch of linguistics that analyzes the lexicon of a specific language. A word is the smallest meaningful unit of a language that can stand on its own, and is made up of small components called morphemes and even smaller elements known as phonemes, or distinguishing sounds. Lexicology examines every feature of a word – including formation, spelling, origin, usage, and definition.
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.
A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words begin, start, commence, and initiate are all synonyms of one another: they are synonymous. The standard test for synonymy is substitution: one form can be replaced by another in a sentence without changing its meaning. Words are considered synonymous in only one particular sense: for example, long and extended in the context long time or extended time are synonymous, but long cannot be used in the phrase extended family. Synonyms with exactly the same meaning share a seme or denotational sememe, whereas those with inexactly similar meanings share a broader denotational or connotational sememe and thus overlap within a semantic field. The former are sometimes called cognitive synonyms and the latter, near-synonyms, plesionyms or poecilonyms.
Polysemy is the capacity for a sign to have multiple related meanings. For example, a word can have several word senses. Polysemy is distinct from monosemy, where a word has a single meaning.
Hyponymy and hypernymy are semantic relations between a term belonging in a set that is defined by another term and the latter. In other words, the relationship of a subtype (hyponym) and the supertype. The semantic field of the hyponym is included within that of the hypernym. For example, pigeon, crow, and eagle are all hyponyms of bird, their hypernym.
Lexical semantics, as a subfield of linguistic semantics, is the study of word meanings. It includes the study of how words structure their meaning, how they act in grammar and compositionality, and the relationships between the distinct senses and uses of a word.
Semantic change is a form of language change regarding the evolution of word usage—usually to the point that the modern meaning is radically different from the original usage. In diachronic linguistics, semantic change is a change in one of the meanings of a word. Every word has a variety of senses and connotations, which can be added, removed, or altered over time, often to the extent that cognates across space and time have very different meanings. The study of semantic change can be seen as part of etymology, onomasiology, semasiology, and semantics.
In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of words, phrases, clauses, sentences and paragraphs to the level of the writing as a whole, to their language-independent meanings. It also involves removing features specific to particular linguistic and cultural contexts, to the extent that such a project is possible. The elements of idiom and figurative speech, being cultural, are often also converted into relatively invariant meanings in semantic analysis. Semantics, although related to pragmatics, is distinct in that the former deals with word or sentence choice in any given context, while pragmatics considers the unique or particular meaning derived from context or tone. To reiterate in different terms, semantics is about universally coded meaning, and pragmatics, the meaning encoded in words that is then interpreted by an audience.
In lexicography, a lexical item is a single word, a part of a word, or a chain of words (catena) that forms the basic elements of a language's lexicon (≈ vocabulary). Examples are cat, traffic light, take care of, by the way, and it's raining cats and dogs. Lexical items can be generally understood to convey a single meaning, much as a lexeme, but are not limited to single words. Lexical items are like semes in that they are "natural units" translating between languages, or in learning a new language. In this last sense, it is sometimes said that language consists of grammaticalized lexis, and not lexicalized grammar. The entire store of lexical items in a language is called its lexis.
In linguistics, a semantic field is a lexical set of words grouped semantically that refers to a specific subject. The term is also used in anthropology, computational semiotics, and technical exegesis.
Sensu is a Latin word meaning "in the sense of". It is used in a number of fields including biology, geology, linguistics, semiotics, and law. Commonly it refers to how strictly or loosely an expression is used in describing any particular concept, but it also appears in expressions that indicate the convention or context of the usage.
A semantic loan is a process of borrowing semantic meaning from another language, very similar to the formation of calques. In this case, however, the complete word in the borrowing language already exists; the change is that its meaning is extended to include another meaning its existing translation has in the lending language. Calques, loanwords and semantic loans are often grouped roughly under the phrase "borrowing". Semantic loans often occur when two languages are in close contact, and they take various forms. The source and target word may be cognates, which may or may not share any contemporary meaning in common; they may be an existing loan translation or parallel construction ; or they may be unrelated words that share an existing meaning.
In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word. Given that the output of word-sense induction is a set of senses for the target word, this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.
Classic monolingual Word Sense Disambiguation evaluation tasks uses WordNet as its sense inventory and is largely based on supervised / semi-supervised classification with the manually sense annotated corpora:
In linguistics, an expression is semantically ambiguous when it can have multiple meanings. The higher the amount of synonyms a word has, the higher the degree of ambiguity. Like other kinds of ambiguity, semantic ambiguities are often clarified by context or by prosody. One's comprehension of a sentence in which a semantically ambiguous word is used is strongly influenced by the general structure of the sentence. The language itself is sometimes a contributing factor in the overall effect of semantic ambiguity, in the sense that the level of ambiguity in the context can change depending on whether or not a language boundary is crossed.
Cognitive sociolinguistics is an emerging field of linguistics that aims to account for linguistic variation in social settings with a cognitive explanatory framework. The goal of cognitive sociolinguists is to build a mental model of society, individuals, institutions and their relations to one another. Cognitive sociolinguists also strive to combine theories and methods used in cognitive linguistics and sociolinguistics to provide a more productive framework for future research on language variation. This burgeoning field concerning social implications on cognitive linguistics has yet received universal recognition.
In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.
An interlingual homograph is a word that occurs in more than one written language, but which has a different meaning or pronunciation in each language. For example word "done" is an adjective in English, a verb in Spanish and a noun in Czech.