Frame-based terminology

Last updated

Frame-based terminology is a cognitive approach to terminology developed by Pamela Faber and colleagues at the University of Granada. One of its basic premises is that the conceptualization of any specialized domain is goal-oriented, and depends to a certain degree on the task to be accomplished. Since a major problem in modeling any domain is the fact that languages can reflect different conceptualizations and construals, texts as well as specialized knowledge resources are used to extract a set of domain concepts. Language structure is also analyzed to obtain an inventory of conceptual relations to structure these concepts.

Cognitive linguistics (CL) is an interdisciplinary branch of linguistics, combining knowledge and research from both psychology and linguistics. It describes how language interacts with cognition, how language forms our thoughts, and the evolution of language parallel with the change in the common mindset across time.

Terminology is the study of terms and their use. Terms are words and compound words or multi-word expressions that in specific contexts are given specific meanings—these may deviate from the meanings the same words have in other contexts and in everyday language. Terminology is a discipline that studies, among other things, the development of such terms and their interrelationships within a specialized domain. Terminology differs from lexicography, as it involves the study of concepts, conceptual systems and their labels (terms), whereas lexicography studies words and their meanings.

Pamela Faber Benítez is an American/Spanish linguist. She holds the Chair of Translation and Interpreting at the Department of Translation and Interpreting of the University of Granada since 2001. She received her Ph.D. from the University of Granada in 1986 and also holds degrees from the University of North Carolina at Chapel Hill and Paris-Sorbonne University.

As its name implies, frame-based terminology uses certain aspects of frame semantics to structure specialized domains and create non-language-specific representations. Such configurations are the conceptual meaning underlying specialized texts in different languages, and thus facilitate specialized knowledge acquisition.

Frame semantics is a theory of linguistic meaning developed by Charles J. Fillmore that extends his earlier case grammar. It relates linguistic semantics to encyclopedic knowledge. The basic idea is that one cannot understand the meaning of a single word without access to all the essential knowledge that relates to that word. For example, one would not be able to understand the word "sell" without knowing anything about the situation of commercial transfer, which also involves, among other things, a seller, a buyer, goods, money, the relation between the money and the goods, the relations between the seller and the goods and the money, the relation between the buyer and the goods and the money and so on.

Frame-based terminology focuses on:

  1. conceptual organization;
  2. the multidimensional nature of terminological units; and
  3. the extraction of semantic and syntactic information through the use of multilingual corpora.

In frame-based terminology, conceptual networks are based on an underlying domain event, which generates templates for the actions and processes that take place in the specialized field as well as the entities that participate in them.

As a result, knowledge extraction is largely text-based. The terminological entries are composed of information from specialized texts as well as specialized language resources. Knowledge is configured and represented in a dynamic conceptual network that is capable of adapting to new contexts. At the most general level, generic roles of agent, patient, result, and instrument are activated by basic predicate meanings such as make, do, affect, use, become, etc. which structure the basic meanings in specialized texts. From a linguistic perspective, Aktionsart distinctions in texts are based on Van Valin's classification of predicate types. At the more specific levels of the network, the qualia structure of the generative lexicon is used as a basis for the systematic classification and relation of nominal entities.

The lexical aspect or aktionsart of a verb is a part of the way in which that verb is structured in relation to time. Any event, state, process, or action which a verb expresses—collectively, any eventuality—may also be said to have the same lexical aspect. Lexical aspect is distinguished from grammatical aspect: lexical aspect is an inherent property of a (semantic) eventuality, whereas grammatical aspect is a property of a realization. Lexical aspect is invariant, while grammatical aspect can be changed according to the whims of the speaker.

Generative Lexicon (GL) is a theory of linguistic semantics which focuses on the distributed nature of compositionality in natural language. The first major work outlining the framework is James Pustejovsky's "Generative Lexicon" (1991). Subsequent important developments are presented in Pustejovsky and Boguraev (1993), Bouillon (1997), and Busa (1996). The first unified treatment of GL was given in Pustejovsky (1995). Unlike purely verb-based approaches to compositionality, Generative Lexicon attempts to spread the semantic load across all constituents of the utterance. Central to the philosophical perspective of GL are two major lines of inquiry: (1) How is it that we are able to deploy a finite number of words in our language in an unbounded number of contexts? (2) Is lexical information and the representations used in composing meanings separable from our commonsense knowledge?

The methodology of frame-based terminology derives the conceptual system of the domain by means of an integrated top-down and bottom-up approach. The bottom-up approach consists of extracting information from a corpus of texts in various languages, specifically related to the domain. The top-down approach includes the information provided by specialized dictionaries and other reference material, complemented by the help of experts in the field.

In linguistics, a corpus or text corpus is a large and structured set of texts. In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

In a parallel way, the underlying conceptual framework of a knowledge-domain event is specified. The most generic or base-level categories of a domain are configured in a prototypical domain event or action-environment interface. This provides a template applicable to all levels of information structuring. In this way a structure is obtained which facilitates and enhances knowledge acquisition since the information in term entries is internally as well as externally coherent.

Related Research Articles

Database organized collection of data

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can utilize to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings from psychology about how humans solve problems and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning, such as the application of rules or the relations of sets and subsets.

KL-ONE is a well known knowledge representation system in the tradition of semantic networks and frames; that is, it is a frame language. The system is an attempt to overcome semantic indistinctness in semantic network representations and to explicitly represent conceptual information as a structured inheritance network.

Data model abstract model for organizing data; abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.

Glossary Alphabetical list of terms relevant to a certain field of study or action

A glossary, also known as a vocabulary or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms. Traditionally, a glossary appears at the end of a book and includes terms within that book that are either newly introduced, uncommon, or specialized. While glossaries are most commonly associated with non-fiction books, in some cases, fiction novels may come with a glossary for unfamiliar terms.

A modeling language is any artificial language that can be used to express information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in the structure.

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction

IDEF

IDEF, initially abbreviation of ICAM Definition, renamed in 1999 as Integration DEFinition, refers to a family of modeling languages in the field of systems and software engineering. They cover a wide range of uses, from functional modeling to data, simulation, object-oriented analysis/design and knowledge acquisition. These "definition languages" were developed under funding from U.S. Air Force and although still most commonly used by them, as well as other military and United States Department of Defense (DoD) agencies, are in the public domain.

Legal translation translation of texts within the field of law

Legal translation is the translation of texts within the field of law. As law is a culture-dependent subject field, legal translation is not necessarily linguistically transparent. Intransparency in translation can be avoided somewhat by use of Latin legal terminology, where possible.

Information model representation of conceptual relationships between things

An information model in software engineering is a representation of concepts and the relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. Typically it specifies relations between kinds of things, but may also include relations with individual things. It can provide sharable, stable, and organized structure of information requirements or knowledge for the domain context.

In computer science and artificial intelligence, ontology languages are formal languages used to construct ontologies. They allow the encoding of knowledge about specific domains and often include reasoning rules that support the processing of that knowledge. Ontology languages are usually declarative languages, are almost always generalizations of frame languages, and are commonly based on either first-order logic or on description logic.

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

The Semantics of Business Vocabulary and Business Rules (SBVR) is an adopted standard of the Object Management Group (OMG) intended to be the basis for formal and detailed natural language declarative description of a complex entity, such as a business. SBVR is intended to formalize complex compliance rules, such as operational rules for an enterprise, security policy, standard compliance, or regulatory compliance rules. Such formal vocabularies and rules can be interpreted and used by computer systems. SBVR is an integral part of the OMG's model-driven architecture (MDA).

Ontology engineering

Ontology engineering in computer science, information science and systems engineering is a field which studies the methods and methodologies for building ontologies: formal representations of a set of concepts within a domain and the relationships between those concepts. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

Paul Compton is an Emeritus Professor at the University of New South Wales (UNSW). He is also the former Head of the UNSW School of Computer Science and Engineering. He is known for proposing "ripple-down rules".

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

References

  1. Faber, Pamela. 2009. The cognitive shift in terminology and specialized translation. MonTI, no. 1: 107–134.
  2. Faber, Pamela, Pilar León Araúz, and Juan Antonio Prieto Velasco. 2009. Semantic Relations, Dynamicity, and Terminological Knowledge Bases. Current Issues in Language Studies 1: 1–23.
  3. Faber, Pamela, Pilar León Araúz, Juan Antonio Prieto Velasco, and Arianne Reimerink. 2007. Linking Images and Words: the description of specialized concepts. International Journal of Lexicography 20, no. 1: 39–65. doi:10.1093/ijl/ecl038.
  4. Faber, Pamela, Silvia Montero Martínez, María Rosa Castro Prieto, José Senso Ruiz, Juan Antonio Prieto Velasco, Pilar León Araúz, Carlos Márquez Linares, and Miguel Vega Expósito. 2006. Process-oriented terminology management in the domain of Coastal Engineering. Terminology 12, no. 2: 189–213. doi:10.1075/term.12.2.03fab.
  5. Faber, Pamela, Carlos Márquez Linares, and Miguel Vega Expósito. 2005. Framing Terminology: A Process-Oriented Approach. Meta : journal des traducteurs / Meta: Translators’ Journal 50, no. 4.