Abstract family of languages

Last updated

In computer science, in particular in the field of formal language theory, an abstract family of languages is an abstract mathematical notion generalizing characteristics common to the regular languages, the context-free languages and the recursively enumerable languages, and other families of formal languages studied in the scientific literature.

Contents

Formal definitions

A formal language is a set L for which there exists a finite set of abstract symbols Σ such that , where * is the Kleene star operation.

A family of languages is an ordered pair , where

  1. Σ is an infinite set of symbols;
  2. Λ is a set of formal languages;
  3. For each L in Λ there exists a finite subset such that ; and
  4. L ≠ Ø for some L in Λ.

A trio is a family of languages closed under homomorphisms that do not introduce the empty word, inverse homomorphisms, and intersections with a regular language.

A full trio, also called a cone, is a trio closed under arbitrary homomorphism.

A (full) semi-AFL is a (full) trio closed under union.

A (full) AFL is a (full) semi-AFL closed under concatenation and the Kleene plus.

Some families of languages

The following are some simple results from the study of abstract families of languages. [1]

Within the Chomsky hierarchy, the regular languages, the context-free languages, and the recursively enumerable languages are all full AFLs. However, the context sensitive languages and the recursive languages are AFLs, but not full AFLs because they are not closed under arbitrary homomorphisms.

The family of regular languages are contained within any cone (full trio). Other categories of abstract families are identifiable by closure under other operations such as shuffle, reversal, or substitution. [2]

Origins

Seymour Ginsburg of the University of Southern California and Sheila Greibach of Harvard University presented the first AFL theory paper at the IEEE Eighth Annual Symposium on Switching and Automata Theory in 1967. [3]

Notes

  1. Mateescu, A.; Salomaa, A. (2001) [1994], "Abstract family of languages", Encyclopedia of Mathematics , EMS Press
  2. Păun, Gh. (2001) [1994], "AFL operations", Encyclopedia of Mathematics , EMS Press
  3. Ginsburg & Greibach (1967)

Related Research Articles

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar. Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In mathematical logic and computer science, the Kleene star is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid construction. The application of the Kleene star to a set is written as . It is widely used for regular expressions, which is the context in which it was introduced by Stephen Kleene to characterize certain automata, where it means "zero or more repetitions".

  1. If is a set of strings, then is defined as the smallest superset of that contains the empty string and is closed under the string concatenation operation.
  2. If is a set of symbols or characters, then is the set of all strings over symbols in , including the empty string .
<span class="mw-page-title-main">Pushdown automaton</span> Type of automaton

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

<span class="mw-page-title-main">Automata theory</span> Study of abstract machines and automata

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a Finite Automaton (FA) or Finite-State Machine (FSM). The figure on the right illustrates a finite-state machine, which is a well-known type of automaton. This automaton consists of states and transitions. As the automaton sees a symbol of input, it makes a transition to another state, according to its transition function, which takes the previous state and current input symbol as its arguments.

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

Sheila Adele Greibach is a researcher in formal languages in computing, automata, compiler theory and computer science. She is an Emeritus Professor of Computer Science at the University of California, Los Angeles, and notable work include working with Seymour Ginsburg and Michael A. Harrison in context-sensitive parsing using the stack automaton model.

Seymour Ginsburg was an American pioneer of automata theory, formal language theory, and database theory, in particular; and computer science, in general. His work was influential in distinguishing theoretical Computer Science from the disciplines of Mathematics and Electrical Engineering.

In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. Terminal symbols are the elementary symbols of the language defined by a formal grammar. Nonterminal symbols are replaced by groups of terminal symbols according to the production rules.

Language equations are mathematical statements that resemble numerical equations, but the variables assume values of formal languages rather than numbers. Instead of arithmetic operations in numerical equations, the variables are joined by language operations. Among the most common operations on two languages A and B are the set union AB, the set intersection AB, and the concatenation AB. Finally, as an operation taking a single operand, the set A* denotes the Kleene star of the language A. Therefore language equations can be used to represent formal grammars, since the languages generated by the grammar must be the solution of a system of language equations.

In mathematics and theoretical computer science, a semiautomaton is a deterministic finite automaton having inputs but no output. It consists of a set Q of states, a set Σ called the input alphabet, and a function T: Q × Σ → Q called the transition function.

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

In formal language theory, a cone is a set of formal languages that has some desirable closure properties enjoyed by some well-known sets of languages, in particular by the families of regular languages, context-free languages and the recursively enumerable languages. The concept of a cone is a more abstract notion that subsumes all of these families. A similar notion is the faithful cone, having somewhat relaxed conditions. For example, the context-sensitive languages do not form a cone, but still have the required properties to form a faithful cone.

An abstract family of acceptors (AFA) is a grouping of generalized acceptors. Informally, an acceptor is a device with a finite state control, a finite number of input symbols, and an internal store with a read and write function. Each acceptor has a start state and a set of accepting states. The device reads a sequence of symbols, transitioning from state to state for each input symbol. If the device ends in an accepting state, the device is said to accept the sequence of symbols. A family of acceptors is a set of acceptors with the same type of internal store. The study of AFA is part of AFL (abstract families of languages) theory.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

<span class="mw-page-title-main">Noncommutative signal-flow graph</span>

In automata theory and control theory, branches of mathematics, theoretical computer science and systems engineering, a noncommutative signal-flow graph is a tool for modeling interconnected systems and state machines by mapping the edges of a directed graph to a ring or semiring.

In mathematics, a local language is a formal language for which membership of a word in the language can be determined by looking at the first and last symbol and each two-symbol substring of the word. Equivalently, it is a language recognised by a local automaton, a particular kind of deterministic finite automaton.

References