This article needs additional citations for verification .(April 2016) |
Attempto Controlled English (ACE) is a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. [1] It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced. [2]
ACE can serve as knowledge representation, specification, and query language, and is intended for professionals who want to use formal notations and formal methods, but may not be familiar with them. Though ACE appears perfectly natural—it can be read and understood by any speaker of English—it is in fact a formal language. [1]
ACE and its related tools have been used in the fields of software specifications, theorem proving, proof assistants, text summaries, ontologies, rules, querying, medical documentation and planning.
Here are some simple examples:
ACE construction rules require that each noun be introduced by a determiner (a, every, no, some, at least 5, ...). Regarding the list of examples above, ACE interpretation rules decide that (1) is interpreted as universally quantified, while (2) is interpreted as existentially quantified. Sentences like "Women are human" do not follow ACE syntax and are consequently not valid.
Interpretation rules resolve the anaphoric references in (3): the tie and it of the second sentence refer to a new tie of the first sentence, while his and the man of the second sentence refer to a man of the first sentence. Thus an ACE text is a coherent entity of anaphorically linked sentences.
The Attempto Parsing Engine (APE) translates ACE texts unambiguously into discourse representation structures (DRS) that use a variant of the language of first-order logic. [3] A DRS can be further translated into other formal languages, for instance AceRules with various semantics, [4] OWL, [5] and SWRL. Translating an ACE text into (a fragment of) first-order logic allows users to reason about the text, for instance to verify, to validate, and to query it.
As an overview of the current version 6.6 of ACE this section:
The vocabulary of ACE comprises:
The grammar of ACE defines and constrains the form and the meaning of ACE sentences and texts. ACE's grammar is expressed as a set of construction rules. The meaning of sentences is described as a small set of interpretation rules. A Troubleshooting Guide describes how to use ACE and how to avoid pitfalls.
An ACE text is a sequence of declarative sentences that can be anaphorically interrelated. Furthermore, ACE supports questions and commands.
A simple sentence asserts that something is the case—a fact, an event, a state.
Simple ACE sentences have the following general structure:
Every sentence has a subject and a verb. Complements (direct and indirect objects) are necessary for transitive verbs (insert something) and ditransitive verbs (give something to somebody), whereas adjuncts (adverbs, prepositional phrases) are optional.
All elements of a simple sentence can be elaborated upon to describe the situation in more detail. To further specify the nouns customer and card, we could add adjectives:
possessive nouns and of-prepositional phrases:
or variables as appositions:
Other modifications of nouns are possible through relative sentences:
which are described below since they make a sentence composite. We can also detail the insertion event, e.g. by adding an adverb:
or, equivalently:
or, by adding prepositional phrases:
We can combine all of these elaborations to arrive at:
Composite sentences are recursively built from simpler sentences through coordination, subordination, quantification, and negation. Note that ACE composite sentences overlap with what linguists call compound sentences and complex sentences.
Coordination by and is possible between sentences and between phrases of the same syntactic type.
Note that the coordination of the noun phrases a card and a code represents a plural object.
Coordination by or is possible between sentences, verb phrases, and relative clauses.
Coordination by and and or is governed by the standard binding order of logic, i.e. and binds stronger than or. Commas can be used to override the standard binding order. Thus the sentence:
means that the customer inserts a VisaCard and a code, or alternatively a MasterCard and a code.
There are four constructs of subordination: relative sentences, if-then sentences, modality, and sentence subordination.
Relative sentences starting with who, which, and that allow to add detail to nouns:
With the help of if-then sentences we can specify conditional or hypothetical situations:
Note the anaphoric reference via the pronoun it in the then-part to the noun phrase a card in the if-part.
Modality allows us to express possibility and necessity:
Sentence subordination comes in various forms:
Quantification allows us to speak about all objects of a certain class (universal quantification), or to denote explicitly the existence of at least one object of this class (existential quantification). The textual occurrence of a universal or existential quantifier opens its scope that extends to the end of the sentence, or in coordinations to the end of the respective coordinated sentence.
To express that all involved customers insert cards we can write
This sentence means that each customer inserts a card that may, or may not, be the same as the one inserted by another customer. To specify that all customers insert the same card—however unrealistic that situation seems—we can write:
or, equivalently:
To state that every card is inserted by a customer we write:
or, somewhat indirectly:
Negation allows us to express that something is not the case:
To negate something for all objects of a certain class one uses no:
or, there is no:
To negate a complete statement one uses sentence negation:
These forms of negation are logical negations, i.e. they state that something is provably not the case. Negation as failure states that a state of affairs cannot be proved, i.e. there is no information whether the state of affairs is the case or not.
ACE supports two forms of queries: yes/no-queries and wh-queries.
Yes/no-queries ask for the existence or non-existence of a specified situation. If we specified:
then we can ask:
to get a positive answer. Note that interrogative sentences always end with a question mark.
With the help of wh-queries, i.e. queries with query words, we can interrogate a text for details of the specified situation. If we specified:
we can ask for each element of the sentence with the exception of the verb.
Queries can also be constructed by a sequence of declarative sentences followed by one interrogative sentence, for example:
ACE also supports commands. Some examples:
A command always consists of a noun phrase (the addressee), followed by a comma, followed by an uncoordinated verb phrase. Furthermore, a command has to end with an exclamation mark.
To constrain the ambiguity of full natural language ACE employs three simple means:
In natural language, relative sentences combined with coordinations can introduce ambiguity:
In ACE the sentence has the unequivocal meaning that the customer opens an account, as reflected by the paraphrase:
To express the alternative—though not very realistic—meaning that the card opens an account, the relative pronoun that must be repeated, thus yielding a coordination of relative sentences:
This sentence is unambiguously equivalent in meaning to the paraphrase:
Not all ambiguities can be safely removed from ACE without rendering it artificial. To deterministically interpret otherwise syntactically correct ACE sentences we use a small set of interpretation rules. For example, if we write:
then with a code attaches to the verb inserts, but not to a card. However, this is probably not what we meant to say. To express that the code is associated with the card we can employ the interpretation rule that a relative sentence always modifies the immediately preceding noun phrase, and rephrase the input as:
yielding the paraphrase:
or—to specify that the customer inserts a card and a code—as:
Usually ACE texts consist of more than one sentence:
To express that all occurrences of card and code should mean the same card and the same code, ACE provides anaphoric references via the definite article:
During the processing of the ACE text, all anaphoric references are replaced by the most recent and most specific accessible noun phrase that agrees in gender and number. As an example of "most recent and most specific", suppose an ACE parser is given the sentence:
Then:
refers to the second card, while:
refers to the first card.
Noun phrases within if-then sentences, universally quantified sentences, negations, modality, and subordinated sentences cannot be referred to anaphorically from subsequent sentences, i.e. such noun phrases are not "accessible" from the following text. Thus for each of the sentences:
we cannot refer to a card with:
Anaphoric references are also possible via personal pronouns:
or via variables:
Anaphoric references via definite articles and variables can be combined:
Note that proper names like SimpleMat always refer to the same object.
English grammar is the set of structural rules of the English language. This includes the structure of words, phrases, clauses, sentences, and whole texts.
In linguistics, a modifier is an optional element in phrase structure or clause structure which modifies the meaning of another element in the structure. For instance, the adjective "red" acts as a modifier in the noun phrase "red ball", providing extra details about which particular ball is being referred to. Similarly, the adverb "quickly" acts as a modifier in the verb phrase "run quickly". Modification can be considered a high-level domain of the functions of language, on par with predication and reference.
De Interpretatione or On Interpretation is the second text from Aristotle's Organon and is among the earliest surviving philosophical works in the Western tradition to deal with the relationship between language and logic in a comprehensive, explicit, and formal way. The work is usually known by its Latin title.
In syntactic analysis, a constituent is a word or a group of words that function as a single unit within a hierarchical structure. The constituent structure of sentences is identified using tests for constituents. These tests apply to a portion of a sentence, and the results provide evidence about the constituent structure of the sentence. Many constituents are phrases. A phrase is a sequence of one or more words built around a head lexical item and working as a unit within a sentence. A word sequence is shown to be a phrase/constituent if it exhibits one or more of the behaviors discussed below. The analysis of constituent structure is associated mainly with phrase structure grammars, although dependency grammars also allow sentence structure to be broken down into constituent parts.
In generative grammar and related frameworks, a node in a parse tree c-commands its sister node and all of its sister's descendants. In these frameworks, c-command plays a central role in defining and constraining operations such as syntactic movement, binding, and scope. Tanya Reinhart introduced c-command in 1976 as a key component of her theory of anaphora. The term is short for "constituent command".
In generative grammar and related approaches, the logical form (LF) of a linguistic expression is the variant of its syntactic structure which undergoes semantic interpretation. It is distinguished from phonetic form, the structure which corresponds to a sentence's pronunciation. These separate representations are postulated in order to explain the ways in which an expression's meaning can be partially independent of its pronunciation, e.g. scope ambiguities.
Manam is a Kairiru–Manam language spoken mainly on the volcanic Manam Island, northeast of New Guinea.
Constraint grammar (CG) is a methodological paradigm for natural language processing (NLP). Linguist-written, context-dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation, inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally or globally. Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated, or blocked by interfering words or tags. Typical CGs consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing a high degree of robustness.
Araki is a nearly extinct language spoken in the small island of Araki, south of Espiritu Santo Island in Vanuatu. Araki is gradually being replaced by Tangoa, a language from a neighbouring island.
The Wuvulu-Aua language is an Austronesian language which is spoken on the Wuvulu and Aua Islands and in the Manus Province of Papua New Guinea.
This article provides a grammar sketch of Basque grammar. Basque is the language of the Basque people of the Basque Country or Euskal Herria, which borders the Bay of Biscay in Western Europe.
Abui is a non-Austronesian language of the Alor Archipelago. It is spoken in the central part of Alor Island in Eastern Indonesia, East Nusa Tenggara (NTT) province by the Abui people. The native name in the Takalelang dialect is Abui tanga which literally translates as 'mountain language'.
In semantics, a donkey sentence is a sentence containing a pronoun which is semantically bound but syntactically free. They are a classic puzzle in formal semantics and philosophy of language because they are fully grammatical and yet defy straightforward attempts to generate their formal language equivalents. In order to explain how speakers are able to understand them, semanticists have proposed a variety of formalisms including systems of dynamic semantics such as Discourse representation theory. Their name comes from the example sentence "Every farmer who owns a donkey beats it", in which "it" acts as a donkey pronoun because it is semantically but not syntactically bound by the indefinite noun phrase "a donkey". The phenomenon is known as donkey anaphora.
In linguistics, a nominal sentence is a sentence without a finite verb. As a nominal sentence does not have a verbal predicate, it may contain a nominal predicate, an adjectival predicate, in Semitic languages also an adverbial predicate or even a prepositional predicate. In Egyptian-Coptic, however, as in the majority of African languages, sentences with adverbial or prepositional predicate show a distinctly different structure. The relation of nominal sentences to verbal sentences is a question of tense marking. In most languages with nominal sentences such as Russian, Arabic and Hebrew, the copular verb does not surface in indicatival present tense sentences. Conversely, these languages allow the copular verb in non-present sentences.
A bound variable pronoun is a pronoun that has a quantified determiner phrase (DP) – such as every, some, or who – as its antecedent.
Mekeo is a language spoken in Papua New Guinea and had 19,000 speakers in 2003. It is an Oceanic language of the Papuan Tip Linkage. The two major villages that the language is spoken in are located in the Central Province of Papua New Guinea. These are named Ongofoina and Inauaisa. The language is also broken up into four dialects: East Mekeo ; Northwest Mekeo ; West Mekeo and North Mekeo. The standard dialect is East Mekeo. This main dialect is addressed throughout the article. In addition, there are at least two Mekeo-based pidgins.
Grass Koiari (Koiali) is a Papuan language of Papua New Guinea spoken in the inland Port Moresby area. It is not very close to the other language which shares its name, Mountain Koiali. It is considered a threatened language.
This article describes the syntax of clauses in the English language, chiefly in Modern English. A clause is often said to be the smallest grammatical unit that can express a complete proposition. But this semantic idea of a clause leaves out much of English clause syntax. For example, clauses can be questions, but questions are not propositions. A syntactic description of an English clause is that it is a subject and a verb. But this too fails, as a clause need not have a subject, as with the imperative, and, in many theories, an English clause may be verbless. The idea of what qualifies varies between theories and has changed over time.
Wamesa is an Austronesian language of Indonesian New Guinea, spoken across the neck of the Doberai Peninsula or Bird's Head. There are currently 5,000–8,000 speakers. While it was historically used as a lingua franca, it is currently considered an under-documented, endangered language. This means that fewer and fewer children have an active command of Wamesa. Instead, Papuan Malay has become increasingly dominant in the area.
Saliba is an Oceanic language spoken on the islets off the southeastern tip of Papua New Guinea. There are approximately 2,500 speakers of Saliba. Significant documentation of the language was undertaken by the Saliba-Logea documentation project, and hundreds of audio-video resources can be found in the project archive.