Expressive power (computer science)

Last updated

In computer science, the expressive power (also called expressiveness or expressivity) of a language is the breadth of ideas that can be represented and communicated in that language. The more expressive a language is, the greater the variety and quantity of ideas it can be used to represent.

Contents

For example, the Web Ontology Language expression language profile (OWL2 EL) lacks ideas (such as negation) that can be expressed in OWL2 RL (rule language). OWL2 EL may therefore be said to have less expressive power than OWL2 RL. These restrictions allow for more efficient (polynomial time) reasoning in OWL2 EL than in OWL2 RL. So OWL2 EL trades some expressive power for more efficient reasoning (processing of the knowledge representation language). [1]

Information description

The term expressive power may be used with a range of meaning. It may mean a measure of the ideas expressible in that language: [2]

The first sense dominates in areas of mathematics and logic that deal with the formal description of languages and their meaning, such as formal language theory, mathematical logic and process algebra. [2]

In informal discussions, the term often refers to the second sense, or to both. This is often the case when discussing programming languages. [3] [ page needed ] Efforts have been made to formalize these informal uses of the term. [4]

The notion of expressive power is always relative to a particular kind of thing that the language in question can describe, and the term is normally used when comparing languages that describe the same kind of things, or at least comparable kinds of things. [4]

The design of languages and formalisms involves a trade-off between expressive power and analyzability. The more a formalism can express, the harder it becomes to understand what instances of the formalism say. Decision problems become harder to answer or completely undecidable. [5]

Examples

In formal language theory

Formal language theory mostly studies formalisms to describe sets of strings, such as context-free grammars and regular expressions. Each instance of a formalism, e.g. each grammar and each regular expression, describes a particular set of strings. In this context, the expressive power of a formalism is the set of sets of strings its instances describe, and comparing expressive power is a matter of comparing these sets.

An important yardstick for describing the relative expressive power of formalisms in this area is the Chomsky hierarchy. It says, for instance, that regular expressions, nondeterministic finite automata and regular grammars have equal expressive power, while that of context-free grammars is greater; what this means is that the sets of sets of strings described by the first three formalisms are equal, and a proper subset of the set of sets of strings described by context-free grammars.

In this area, the cost of expressive power is a central topic of study. It is known, for instance, that deciding whether two arbitrary regular expressions describe the same set of strings is hard, while doing the same for arbitrary context-free grammars is completely impossible. However, it can still be efficiently decided whether any given string is in the set.

For more expressive formalisms, this problem can be harder, or even undecidable. For a Turing complete formalism, such as arbitrary formal grammars, not only this problem, but every nontrivial property regarding the set of strings they describe is undecidable, a fact known as Rice's Theorem.

There are some results on conciseness as well; for instance, nondeterministic finite automata and regular grammars are more concise than regular expressions, in the sense that the latter can be translated to the former without a blowup in size (i.e. in O(1)), while the reverse is not possible.

Similar considerations apply to formalisms that describe not sets of strings, but sets of trees (e.g. XML schema languages), of graphs, or other structures.

In database theory

Database theory is concerned, among other things, with database queries, e.g. formulas that, given the contents of a database, specify certain information to be extracted from it. In the predominant relational database paradigm, the contents of a database are described as a finite set of finite mathematical relations; Boolean queries, that always yield true or false, are formulated in first-order logic.

It turns out that first-order logic is lacking in expressive power: it cannot express certain types of Boolean queries, e.g. queries involving transitive closure. [6] However, adding expressive power must be done with care: it must still remain possible to evaluate queries with reasonable efficiency, which is not the case, e.g., for second-order logic. Consequently, a literature sprang up in which many query languages and language constructs were compared on the basis of expressive power and efficiency, e.g. various versions of Datalog. [7]

Similar considerations apply for query languages on other types of data, e.g. XML query languages such as XQuery.

See also

Related Research Articles

<span class="mw-page-title-main">Chomsky hierarchy</span> Hierarchy of classes of formal grammars

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary that are valid according to the language's syntax. Linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages. Each class can also completely generate the language of all inferior classes.

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

<span class="mw-page-title-main">Regular expression</span> Sequence of characters that forms a search pattern

A regular expression is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In theoretical computer science and formal language theory, a regular grammar is a grammar that is right-regular or left-regular. While their exact definition varies from textbook to textbook, they all require that

<span class="mw-page-title-main">Theory of computation</span> Academic subfield of computer science

In theoretical computer science and mathematics, the theory of computation is the branch that deals with what problems can be solved on a model of computation, using an algorithm, how efficiently they can be solved or to what degree. The field is divided into three major branches: automata theory and formal languages, computability theory, and computational complexity theory, which are linked by the question: "What are the fundamental capabilities and limitations of computers?".

<span class="mw-page-title-main">Turing machine</span> Computation model defining an abstract machine

A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algorithm.

A formal system is an abstract structure used for inferring theorems from axioms according to a set of rules. These rules, which are used for carrying out the inference of theorems from axioms, are the logical calculus of the formal system. A formal system is essentially an "axiomatic system".

Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is closely linked to the existence of an algorithm to solve the problem.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

In computer science, a Van Wijngaarden grammar is a formalism for defining formal languages. The name derives from the formalism invented by Adriaan van Wijngaarden for the purpose of defining the ALGOL 68 programming language. The resulting specification remains its most notable application.

In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

<span class="mw-page-title-main">Grammar induction</span>

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

In automata theory, the class of unrestricted grammars is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted grammar, other than each of their left-hand sides being non-empty. This grammar class can generate arbitrary recursively enumerable languages.

In theoretical computer science and formal language theory, a regular tree grammar is a formal grammar that describes a set of directed trees, or terms. A regular word grammar can be seen as a special kind of regular tree grammar, describing a set of single-path trees.

An affix grammar is a kind of formal grammar; it is used to describe the syntax of languages, mainly computer languages, using an approach based on how natural language is typically described.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

In computational learning theory, induction of regular languages refers to the task of learning a formal description of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way, approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction.

References

  1. Grau, Bernardo Cuenca; Horrocks, Ian; Motik, Boris; Parsia, Bijan; Patel-Schneider, Peter; Sattler, Ulrike (2008). "OWL 2: The next step for OWL". Web Semantics: Science, Services and Agents on the World Wide Web. 6 (4): 309–322. doi:10.1016/j.websem.2008.05.001. ISSN   1570-8268.
  2. 1 2 Farmer, William (2007). "Chiron: A multi-paradigm logic". In R. Matuszewski; A. Zalewska (eds.). From Insight to Proof: Festschrift in Honour of Andrzej Trybulec. Studies in Logic, Grammar and Rhetoric. pp. 1–19. ISBN   978-83-7431-128-1.
  3. Structure and Interpretation of Computer Programs , by Abelson and Sussman
  4. 1 2 Felleisen, Matthias (1991-12-01). "On the expressive power of programming languages". Science of Computer Programming. 17 (1): 35–75. doi: 10.1016/0167-6423(91)90036-W .
  5. Okhotin, Alexander (December 2005). "Unresolved systems of language equations: Expressive power and decision problems". Theoretical Computer Science. 349 (3): 283–308. doi: 10.1016/j.tcs.2005.07.038 .
  6. Serge Abiteboul, Richard B. Hull, Victor Vianu: Foundations of Databases. Addison-Wesley, 1995.
  7. Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov: Complexity and expressive power of logic programming. ACM Comput. Surv. 33(3): 374-425 (2001).