GermaNet

Last updated

GermaNet is a semantic network for the German language. It relates nouns, verbs, and adjectives semantically by grouping lexical units that express the same concept into synsets and by defining semantic relations between these synsets. [1] GermaNet is free for academic use, after signing a license. GermaNet has much in common with the English WordNet and can be viewed as an on-line thesaurus or a light-weight ontology. GermaNet has been developed and maintained at the University of Tübingen since 1997 within the research group for General and Computational Linguistics. It has been integrated into the EuroWordNet, a multilingual lexical-semantic database. [2]

Contents

Database

Contents

GermaNet partitions the lexical space into a set of concepts that are interlinked by semantic relations. A semantic concept is modeled by a synset . A synset is a set of words (called lexical units) where all the words are taken to have the same or almost the same meaning. Thus a synset is a set of synonyms grouped under one definition, or "gloss".

In addition to the gloss, synsets are labeled with their syntactic function and accompanied by example sentences for each distinct meaning in the synset. [3] Just as in WordNet, for each word category the semantic space is divided into a number of semantic fields closely related to major nodes in the semantic network: Ort, or "location", Körper, or "body", etc. [2]

As of version 15.0 (release May 2020), GermaNet contains: [2]

Format

All GermaNet data is stored in a PostgreSQL relational database. The database schema follows the internal structure of GermaNet: there are tables to store synsets, lexical units, conceptual and lexical relations, etc. [3] GermaNet data is distributed both in this database format and as XML files. In the XML data, two types of files, one for synsets and the other for relations, represent all data available in the GermaNet database. [4]

Interfaces

There are software libraries and APIs available for Java, Python, JavaScript, and Perl [5] [6] . These programs are distributed under free-software licenses and provide easy access to all information in various versions of GermaNet.

GermaNet Rover is an online application that can be used to search for synsets in GermaNet, explore the data associated with them, and calculate the semantic similarity of pairs of synsets. It features visualizations of the hypernym relation and advanced filtering options for synset searching.

Licenses

GermaNet 15.0 (released May 2020) can be distributed under one of the following types of license agreements: [7]

Alternatives

Open-de-WordNet is freely available alternative to GermaNet which is compatible with WordNet. [8]

Linguistic Applications

GermaNet has been used for a variety of applications, including:

See also

Related Research Articles

Semantic network knowledge representation scheme that uses a directed graph to encode knowledge

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map.

WordNet Computational lexicon of English

WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. WordNet was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website.

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

In computational linguistics, word-sense disambiguation (WSD) is an open problem concerned with identifying which sense of a word is used in a sentence. The solution to this issue impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Semantic lexicon

A semantic lexicon is a digital dictionary of words labeled with semantic classes so associations can be drawn between words that have not previously been encountered. Semantic lexicons are built upon semantic networks, which represent the semantic relations between words. The difference between a semantic lexicon and a semantic network is that a semantic lexicon has definitions for each word, or a "gloss".

The sequence between semantic related ordered words is classified as a lexical chain. A lexical chain is a sequence of related words in writing, spanning short or long distances. A chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text. A lexical chain can provide a context for the resolution of an ambiguous term and enable identification of the concept that the term represents.

Linguistic categories include

DOGMA, short for Developing Ontology-Grounded Methods and Applications, is the name of research project in progress at Vrije Universiteit Brussel's STARLab, Semantics Technology and Applications Research Laboratory. It is an internally funded project, concerned with the more general aspects of extracting, storing, representing and browsing information.

The eXtended WordNet is a project at the University of Texas at Dallas that aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. It is freely available under a BSD style license. Although it has not been updated since November 2004, it still remains a useful resource.

In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of one or several dictionaries, e.g., in the form of a database.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

BabelNet multilingual semantic network and encyclopedic dictionary

BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions in many languages harvested from both WordNet and Wikipedia.

plWordNet is a lexico-semantic database of the Polish language. It includes sets of synonymous lexical units (synsets) followed by short definitions. plWordNet serves as a thesaurus-dictionary where concepts (synsets) and individual word meanings are defined by their location in the network of mutual relations, reflecting the lexico-semantic system of the Polish language. plWordNet is also used as one of the basic resources for the construction of natural language processing tools for Polish.

The Bulgarian Sense-annotated Corpus (BulSemCor) is a structured corpus of Bulgarian texts in which each lexical item is assigned a sense tag. BulSemCor was created by the Department of Computational Linguistics at the Institute for Bulgarian Language of the Bulgarian Academy of Sciences.

The Bulgarian WordNet (BulNet) is an electronic multilingual dictionary of synonym sets along with their explanatory definitions and sets of semantic relations with other words in the language.

UBY is a large-scale lexical-semantic resource for natural language processing (NLP) developed at the Ubiquitous Knowledge Processing Lab (UKP) in the department of Computer Science of the Technische Universität Darmstadt . UBY is based on the ISO standard Lexical Markup Framework (LMF) and combines information from several expert-constructed and collaboratively constructed resources for English and German.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

Arabic Ontology linguistic ontology, tree of the meaning of the Arabic terms

Arabic Ontology is a linguistic ontology for the Arabic language, which can be used as an Arabic Wordnet with ontologically-clean content. People use it also as a tree of the concepts/meanings of the Arabic terms. It is a formal representation of the concepts that the Arabic terms convey, and its content is ontologically well-founded, and benchmarked to scientific advances and rigorous knowledge sources rather than to speakers’ naïve beliefs as wordnets typically do . The Ontology tree can be explored online.

OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it.

References

  1. Petra Storjohann (23 June 2010). Lexical-semantic relations: theoretical and practical perspectives. John Benjamins Publishing Company. pp. 165–. ISBN   978-90-272-3138-3 . Retrieved 16 November 2011.
  2. 1 2 3 "GermaNet - an Introduction". uni-tuebingen.de. Retrieved October 1, 2020.
  3. 1 2 V. Henrich, E. Hinrichs. 2010. GernEdiT - The GermaNet Editing Tool. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation.
  4. "Data format" . Retrieved October 1, 2020.
  5. "Applications and Tools". uni-tuebingen.de. Retrieved October 1, 2020.
  6. "GermaNet::Flat". metacpan.org. Retrieved October 1, 2020.
  7. "Licenses". uni-tuebingen.de. Retrieved October 1, 2020.
  8. "GitHub - hdaSprachtechnologie/odenet: Open German WordNet". November 14, 2019. Retrieved November 20, 2019 via GitHub.
  9. 1 2 3 Manuela Kunze and Dietmar Rösner. 2004. Issues in Exploiting GermaNet as a Resource in Real Applications.
  10. Sabine Schulte im Walde, 2004. GermaNet Synsets as Selectional Preferences in Semantic Verb Clustering.
  11. Saito et al., 2002. Evaluation of GermanNet: Problems Using GermaNet for Automatic Word Sense Disambiguation.