Glossary

Last updated
A Glossary of Islamic Legal Terminology Ahmad ibn Muhammad ibn 'Ali al-Muqri al Fayyumi - Glossary of Islamic Legal Terminology - Walters W590 - Bottom Exterior.jpg
A Glossary of Islamic Legal Terminology

A glossary (from Ancient Greek : γλῶσσα / language, speech, wording) also known as a vocabulary or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms. Traditionally, a glossary appears at the end of a book and includes terms within that book that are either newly introduced, uncommon, or specialized. While glossaries are most commonly associated with non-fiction books, in some cases, fiction novels may come with a glossary for unfamiliar terms.

Contents

A bilingual glossary is a list of terms in one language defined in a second language or glossed by synonyms (or at least near-synonyms) in another language.

In a general sense, a glossary contains explanations of concepts relevant to a certain field of study or action. In this sense, the term is related to the notion of ontology. Automatic methods have been also provided that transform a glossary into an ontology [1] or a computational lexicon. [2]

History

In medieval Europe, glossaries with equivalents for Latin words in vernacular or simpler Latin were in use, such as the Leiden Glossary (ca. 800 CE).

Core glossary

The intelligence law glossary provides a description of the key terms in intelligence law. Intelligence Law Glossary 2011.pdf
The intelligence law glossary provides a description of the key terms in intelligence law.

A core glossary is a simple glossary or defining dictionary that enables definition of other concepts, especially for newcomers to a language or field of study. It contains a small working vocabulary and definitions for important or frequently encountered concepts, usually including idioms or metaphors useful in a culture.

Automatic extraction of glossaries

Computational approaches to the automated extraction of glossaries from corpora [3] or the Web [4] [5] have been developed in the recent years. These methods typically start from domain terminology and extract one or more glosses for each term of interest. Glosses can then be analyzed to extract hypernyms of the defined term and other lexical and semantic relations.

See also

Related Research Articles

Semantic network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map.

WordNet Computational lexicon of English

WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. WordNet was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website.

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

Word-sense disambiguation (WSD) is an open problem in computational linguistics concerned with identifying which sense of a word is used in a sentence. The solution to this issue impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Wiktionary Free online dictionary that anyone can edit

Wiktionary is a multilingual, web-based project to create a free content dictionary of terms in all natural languages and in a number of artificial languages. These entries may contain definitions, images for illustrations, pronunciations, etymologies, inflections, usage examples, quotations, related terms, and translations of words into other languages, among other features. It is collaboratively edited via a wiki. Its name is a portmanteau of the words wiki and dictionary. It is available in 171 languages and in Simple English. Like its sister project Wikipedia, Wiktionary is run by the Wikimedia Foundation, and is written collaboratively by volunteers, dubbed "Wiktionarians". Its wiki software, MediaWiki, allows almost anyone with access to the website to create and edit entries.

Terminology is a general word for the group of specialized words or meanings relating to a particular field, and also the study of such terms and their use. This is also known as terminology science. Terms are words and compound words or multi-word expressions that in specific contexts are given specific meanings—these may deviate from the meanings the same words have in other contexts and in everyday language. Terminology is a discipline that studies, among other things, the development of such terms and their interrelationships within a specialized domain. Terminology differs from lexicography, as it involves the study of concepts, conceptual systems and their labels (terms), whereas lexicography studies words and their meanings.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

The semantic spectrum is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.

Terminology extraction is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.

Linguistic categories include

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

DSSim is an ontology mapping system, that has been conceived to achieve a certain level of the envisioned machine intelligence on the Semantic Web. The main driving factors behind its development was to provide an alternative to the existing heuristics or machine learning based approaches with a multi-agent approach that makes use of uncertain reasoning. The system provides a possible approach to establish machine understanding over Semantic Web data through multi-agent beliefs and conflict resolution.

Taxonomy is the practice and science of categorization or classification based on discrete sets. The word finds its roots in the Greek language τάξις, taxis and νόμος, nomos.

BabelNet

BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions in many languages harvested from both WordNet and Wikipedia.

Automatic taxonomy construction (ATC) is the use of software programs to generate taxonomical classifications from a body of texts called a corpus. ATC is a branch of natural language processing, which in turn is a branch of artificial intelligence.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

Paola Velardi Professor of computer science

Paola Velardi is a Full Professor of computer science at Sapienza University in Rome, Italy. She is an Italian scientist born in Rome, on April 26, 1955. Her research encompasses natural language processing, machine learning, business intelligence and semantic web, web information extraction in particular. Velardi is one of the hundred female scientists included in the database "100esperte.it". This online, open database champions the recognition of top-rated female scientists in the Science, Technology, Engineering and Mathematics (STEM) area.

In knowledge representation and reasoning, knowledge graph is a knowledge base that uses a graph-structured data model or topology to integrate data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – with free-form semantics.

References

  1. R. Navigli, P. Velardi. From Glossaries to Ontologies: Extracting Semantic Structure from Textual Definitions, Ontology Learning and Population: Bridging the Gap between Text and Knowledge (P. Buitelaar and P. Cimiano, Eds.), Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2008, pp. 71-87.
  2. R. Navigli. Using Cycles and Quasi-Cycles to Disambiguate Dictionary Glosses, Proc. of 12th Conference of the European Association for Computational Linguistics (EACL 2009), Athens, Greece, March 30-April 3rd, 2009, pp. 594-602.
  3. J. Klavans and S. Muresan. Evaluation of the Definder System for Fully Automatic Glossary Construction. In Proc. of American Medical Informatics Association Symp., 2001, pp. 324–328.
  4. A. Fujii, T. Ishikawa. Utilizing the World Wide Web as an Encyclopedia: Extracting Term Descriptions from Semi-Structured Texts. In Proc. 38th Ann. Meeting Assoc. for Computational Linguistics, 2000, pp. 488–495.
  5. P. Velardi, R. Navigli, P. D'Amadio. Mining the Web to Create Specialized Glossaries, IEEE Intelligent Systems, 23(5), IEEE Press, 2008, pp. 18-25.