Controlled natural language

Last updated

Controlled natural languages (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. [1] [2]

Contents

The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the semi-automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". [3]

The second type of languages have a formal syntax and formal semantics, and can be mapped to an existing formal language, such as first-order logic. Thus, those languages can be used as knowledge representation languages, [4] and writing of those languages is supported by fully automatic consistency and redundancy checks, query answering, etc.

Languages

Existing controlled natural languages include: [5] [6]

Encoding

IETF has reserved simple as a BCP 47 variant subtag for simplified versions of languages. [13]

See also

Related Research Articles

Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings from psychology about how humans solve problems, and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning.

Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applying logical reasoning to that knowledge, to solve problems in the domain. Major logic programming language families include Prolog, Answer Set Programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

The following outline is provided as an overview and topical guide to linguistics:

Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subset of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.

A specification language is a formal language in computer science used during systems analysis, requirements analysis, and systems design to describe a system at a much higher level than a programming language, which is used to produce the executable code for a system.

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.

<span class="mw-page-title-main">Logic in computer science</span> Academic discipline

Logic in computer science covers the overlap between the field of logic and that of computer science. The topic can essentially be divided into three main areas:

ASD-STE100 Simplified Technical English (STE) is a controlled language designed to simplify and clarify technical documentation. It was originally developed during the 1980's by the European Association of Aerospace Industries (AECMA), at the request of the European Airline industry, who wanted a standardized form of English for technical documentation that could be easily understood by non-English speakers. It has since been adopted in many other fields outside the aerospace, defense, and maintenance domains for its clear, consistent, and comprehensive nature. The current edition of the STE Specification, published in April 2021, consists of 53 writing rules and a dictionary of approximately 900 approved words.

Multilayered extended semantic networks (MultiNets) are both a knowledge representation paradigm and a language for meaning representation of natural language expressions that has been developed by Prof. Dr. Hermann Helbig on the basis of earlier Semantic Networks. It is used in a question-answering application for German called InSicht. It is also used to create a tutoring application developed by the university of University of Hagen to teach MultiNet to knowledge engineers.

Logic is the formal science of using reason and is considered a branch of both philosophy and mathematics and to a lesser extent computer science. Logic investigates and classifies the structure of statements and arguments, both through the study of formal systems of inference and the study of arguments in natural language. The scope of logic can therefore be very large, ranging from core topics such as the study of fallacies and paradoxes, to specialized analyses of reasoning such as probability, correct reasoning, and arguments involving causality. One of the aims of logic is to identify the correct and incorrect inferences. Logicians study the criteria for the evaluation of arguments.

Attempto Controlled English (ACE) is a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced.

Frames are an artificial intelligence data structure used to divide knowledge into substructures by representing "stereotyped situations". They were proposed by Marvin Minsky in his 1974 article "A Framework for Representing Knowledge". Frames are the primary data structure used in artificial intelligence frame languages; they are stored as ontologies of sets.

<span class="mw-page-title-main">Diagrammatic reasoning</span>

Diagrammatic reasoning is reasoning by means of visual representations. The study of diagrammatic reasoning is about the understanding of concepts and ideas, visualized with the use of diagrams and imagery instead of by linguistic or algebraic means.

The Semantics of Business Vocabulary and Business Rules (SBVR) is an adopted standard of the Object Management Group (OMG) intended to be the basis for formal and detailed natural language declarative description of a complex entity, such as a business. SBVR is intended to formalize complex compliance rules, such as operational rules for an enterprise, security policy, standard compliance, or regulatory compliance rules. Such formal vocabularies and rules can be interpreted and used by computer systems. SBVR is an integral part of the OMG's model-driven architecture (MDA).

Formal semantics is the study of grammatical meaning in natural languages using formal tools from logic, mathematics and theoretical computer science. It is an interdisciplinary field, sometimes regarded as a subfield of both linguistics and philosophy of language. It provides accounts of what linguistic expressions mean and how their meanings are composed from the meanings of their parts. The enterprise of formal semantics can be thought of as that of reverse-engineering the semantic components of natural languages' grammars.

Adrian David Walker is a US Computer Scientist, born in London, England.

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

References

  1. "A Survey and Classification of Controlled Natural Languages". direct.mit.edu. Retrieved 2024-03-27.
  2. "Controlled Natural Languages for language generation in artificial cognition". ieeexplore.ieee.org. Retrieved 2024-03-27.
  3. O'Brien, Sharon (2003). "Controlling Controlled English – An Analysis of Several Controlled Language Rule Sets" (PDF). Proceedings of EAMT-CLAW. Archived from the original (PDF) on 2016-03-03. Retrieved 2011-12-30.
  4. Schwitter, Rolf. "Controlled natural languages for knowledge representation." Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010.
  5. Kuhn, Tobias (2014). "A Survey and Classification of Controlled Natural Languages". Computational Linguistics. 40: 121–170. arXiv: 1507.01701 . doi: 10.1162/COLI_a_00168 . S2CID   14586568.
  6. Pool, Jonathan (2006). "Can Controlled Languages Scale to the Web?". Archived from the original on 2009-08-15.{{cite journal}}: Cite journal requires |journal= (help)
  7. Norbert E. Fuchs; Kaarel Kaljurand; Gerold Schneider (2006). "Attempto Controlled English Meets the Challenges of Knowledge Representation, Reasoning, Interoperability and User Interfaces" (PDF). FLAIRS 2006.
  8. Ogden, Charles Kay (1930). Basic English: A General Introduction with Rules and Grammar. London: Paul Treber & Co., Ltd.
  9. "Common Logic Controlled English". www.jfsowa.com. Retrieved 27 August 2017.
  10. Kowalski, R., Dávila, J., Sartor, G. and Calejo, M., 2023. Logical English for law and education. In Prolog: The Next 50 Years (pp. 287-299). Cham: Springer Nature Switzerland.
  11. Wasik, Szymon; Prejzendanc, Tomasz; Blazewicz, Jacek (2013). "ModeLang: A New Approach for Experts-Friendly Viral Infections Modeling". Computational and Mathematical Methods in Medicine. 2013: 320715. doi: 10.1155/2013/320715 . PMC   3878415 . PMID   24454531.
  12. Schwitter, Rolf; Tilbrook, M (2004). "PENG: Processable ENGlish". Technical Report, Macquarie University, Australia.
  13. Everson, Michael. "Registration form for 'simple'". IANA. Retrieved 22 April 2021.