OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group). [1]
The OntoLex-Lemon vocabulary represents a vocabulary for publishing lexical data as a knowledge graph, in an RDF format and/or as Linguistic Linked Open Data. Since its publication as a W3C Community report in 2016, [2] it serves as ``a de facto standard to represent ontology-lexica on the Web´´. [3] OntoLex-Lemon is a revision of the Lemon vocabulary originally proposed by McCrae et al. (2011). [4]
The core elements of OntoLex-Lemon, shown in Fig. 1, are:
Aside from the core module (namespace http://www.w3.org/ns/lemon/ontolex#), other modules specify designated vocabulary for representing lexicon metadata [6] (namespace http://www.w3.org/ns/lemon/lime#), lexical-semantic relations (e.g., translation and variation, namespace http://www.w3.org/ns/lemon/vartrans#), multi-word expressions (decomposition, namespace http://www.w3.org/ns/lemon/decomp#) and syntactic frames (namespace http://www.w3.org/ns/lemon/synsem#).
The data structures of OntoLex-Lemon are comparable with those of other dictionary formats (see related vocabularies below). The innovative element about OntoLex-Lemon is that it provides such a data model as an RDF vocabulary, as this enables novel use cases that are based on web technologies rather than stand-alone dictionaries (e.g., translation inference, see applications below). For the foreseeable future, OntoLex-Lemon will also remain unique in this role, as the (Linguistic) Linked Open Data community strongly encourages to reuse existing vocabularies [7] and as of Dec 2019, OntoLex-Lemon is the only established (i.e., published by W3C or another standardization initiative) vocabulary for its purpose. This is also reflected in recent extensions to the original OntoLex-Lemon specification, where novel modules have been developed to extend the use of OntoLex-Lemon to novel areas of application:
OntoLex-Lemon is widely used for lexical resources in the context of Linguistic Linked Open Data. Selected applications include
OntoLex development is regularly addressed in scientific events dedicated to ontologies, linked data or lexicography. Since 2017, a designated workshop series on the OntoLex module is conducted biannually. [38]
Related vocabularies that focus on standardizing and publishing lexical resources include DICT (text-based format), the XML Dictionary eXchange Format, TEI-Dict (XML) and the Lexical Markup Framework (abstract model usually serialized in XML; the Lemon vocabulary originally evolved from an RDF serialization of LMF). OntoLex-Lemon differs from these earlier models in being a native Linked Open Data vocabulary that does not (just) formalize structure and semantics of machine-readable dictionaries, but is designed to facilitate information integration between them.