Language equation

Last updated

Language equations are mathematical statements that resemble numerical equations, but the variables assume values of formal languages rather than numbers. Instead of arithmetic operations in numerical equations, the variables are joined by language operations. Among the most common operations on two languages A and B are the set union AB, the set intersection AB, and the concatenation AB. Finally, as an operation taking a single operand, the set A* denotes the Kleene star of the language A. Therefore, language equations can be used to represent formal grammars, since the languages generated by the grammar must be the solution of a system of language equations.

Contents

Language equations and context-free grammars

Ginsburg and Rice [1] gave an alternative definition of context-free grammars by language equations. To every context-free grammar , is associated a system of equations in variables . Each variable is an unknown language over and is defined by the equation where , ..., are all productions for . Ginsburg and Rice used a fixed-point iteration argument to show that a solution always exists, and proved that the assignment is the least solution to this system,[ clarify ] i.e. any other solution must be a subset[ clarify ] of this one.

For example, the grammar corresponds to the equation system which has as solution every superset of .

Language equations with added intersection analogously correspond to conjunctive grammars.[ citation needed ]

Language equations and finite automata

Brzozowski and Leiss [2] studied left language equations where every concatenation is with a singleton constant language on the left, e.g. with variable , but not nor . Each equation is of the form with one variable on the right-hand side. Every nondeterministic finite automaton has such corresponding equation using left-concatenation and union, see Fig. 1. If intersection operation is allowed, equations correspond to alternating finite automata.

Fig. 1: A finite automaton with associated system of equations
S
1
=
1
[?]
S
1
[?]
0
[?]
S
2
[?]
{
[?]
}
{\displaystyle S_{1}=1\cdot S_{1}\cup 0\cdot S_{2}\cup \{\epsilon \}}
,
S
2
=
1
[?]
S
2
[?]
0
[?]
S
1
{\displaystyle S_{2}=1\cdot S_{2}\cup 0\cdot S_{1}}
where
[?]
{\displaystyle \epsilon }
is the empty word. DFAexample.svg
Fig. 1: A finite automaton with associated system of equations , where is the empty word.

Baader and Narendran [3] studied equations using left-concatenation and union and proved that their satisfiability problem is EXPTIME-complete.

Conway's problem

Conway [4] proposed the following problem: given a constant finite language , is the greatest solution of the equation always regular? This problem was studied by Karhumäki and Petre [5] [6] who gave an affirmative answer in a special case. A strongly negative answer to Conway's problem was given by Kunc [7] who constructed a finite language such that the greatest solution of this equation is not recursively enumerable.

Kunc [8] also proved that the greatest solution of inequality is always regular.

Language equations with Boolean operations

Language equations with concatenation and Boolean operations were first studied by Parikh, Chandra, Halpern and Meyer [9] who proved that the satisfiability problem for a given equation is undecidable, and that if a system of language equations has a unique solution, then that solution is recursive. Later, Okhotin [10] proved that the unsatisfiability problem is RE-complete and that every recursive language is a unique solution of some equation.

Language equations over a unary alphabet

For a one-letter alphabet, Leiss [11] discovered the first language equation with a nonregular solution, using complementation and concatenation operations. Later, Jeż [12] showed that non-regular unary languages can be defined by language equations with union, intersection and concatenation, equivalent to conjunctive grammars. By this method Jeż and Okhotin [13] proved that every recursive unary language is a unique solution of some equation.

See also

Related Research Articles

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

In mathematical logic and computer science, the Kleene star is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid construction. The application of the Kleene star to a set is written as . It is widely used for regular expressions, which is the context in which it was introduced by Stephen Kleene to characterize certain automata, where it means "zero or more repetitions".

  1. If is a set of strings, then is defined as the smallest superset of that contains the empty string and is closed under the string concatenation operation.
  2. If is a set of symbols or characters, then is the set of all strings over symbols in , including the empty string .

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In logic and computer science, unification is an algorithmic process of solving equations between symbolic expressions. For example, using x,y,z as variables, the singleton equation set { cons(x,cons(x,nil)) = cons(2,y) } is a syntactic first-order unification problem that has the substitution { x ↦ 2, ycons(2,nil) } as its only solution.

In computational complexity theory, EXPSPACE is the set of all decision problems solvable by a deterministic Turing machine in exponential space, i.e., in space, where is a polynomial function of . Some authors restrict to be a linear function, but most authors instead call the resulting class ESPACE. If we use a nondeterministic machine instead, we get the class NEXPSPACE, which is equal to EXPSPACE by Savitch's theorem.

The Post correspondence problem is an undecidable decision problem that was introduced by Emil Post in 1946. Because it is simpler than the halting problem and the Entscheidungsproblem it is often used in proofs of undecidability.

In mathematics, a Kleene algebra is an idempotent semiring endowed with a closure operator. It generalizes the operations known from regular expressions.

In computational complexity theory, the complexity class ELEMENTARY of elementary recursive functions is the union of the classes

The simply typed lambda calculus, a form of type theory, is a typed interpretation of the lambda calculus with only one type constructor that builds function types. It is the canonical and simplest example of a typed lambda calculus. The simply typed lambda calculus was originally introduced by Alonzo Church in 1940 as an attempt to avoid paradoxical use of the untyped lambda calculus.

In computability theory and computational complexity theory, RE is the class of decision problems for which a 'yes' answer can be verified by a Turing machine in a finite amount of time. Informally, it means that if the answer to a problem instance is 'yes', then there is some procedure that takes finite time to determine this, and this procedure never falsely reports 'yes' when the true answer is 'no'. However, when the true answer is 'no', the procedure is not required to halt; it may go into an "infinite loop" for some 'no' cases. Such a procedure is sometimes called a semi-algorithm, to distinguish it from an algorithm, defined as a complete solution to a decision problem.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

Boolean grammars, introduced by Okhotin, are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with conjunction and negation operations. Besides these explicit operations, Boolean grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction and negation can be used, in particular, to specify intersection and complement of languages. An intermediate class of grammars known as conjunctive grammars allows conjunction and disjunction, but not negation.

Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

In mathematics, a local language is a formal language for which membership of a word in the language can be determined by looking at the first and last symbol and each two-symbol substring of the word. Equivalently, it is a language recognised by a local automaton, a particular kind of deterministic finite automaton.

<span class="mw-page-title-main">Constant-recursive sequence</span> Infinite sequence of numbers satisfying a linear equation

In mathematics and theoretical computer science, a constant-recursive sequence is an infinite sequence of numbers in which each number in the sequence is equal to a fixed linear combination of one or more of its immediate predecessors. The concept is variously known as a linear recurrence sequence, linear-recursive sequence, linear-recurrent sequence, a C-finite sequence, or a solution to a linear recurrence with constant coefficients.

<span class="mw-page-title-main">Juhani Karhumäki</span> Finnish mathematician and theoretical computer scientist

Eero Urho Juhani Karhumäki is a Finnish mathematician and theoretical computer scientist known for his contributions to automata theory. He is a professor at the University of Turku.

State complexity is an area of theoretical computer science dealing with the size of abstract automata, such as different kinds of finite automata. The classical result in the area is that simulating an -state nondeterministic finite automaton by a deterministic finite automaton requires exactly states in the worst case.

MINimum Relevant Variables in Linear System (Min-RVLS) is a problem in mathematical optimization. Given a linear program, it is required to find a feasible solution in which the number of non-zero variables is as small as possible.

References

  1. Ginsburg, Seymour; Rice, H. Gordon (1962). "Two Families of Languages Related to ALGOL". Journal of the ACM. 9 (3): 350–371. doi: 10.1145/321127.321132 . ISSN   0004-5411. S2CID   16718187.
  2. 1 2 Brzozowski, J.A.; Leiss, E. (1980). "On equations for regular languages, finite automata, and sequential networks". Theoretical Computer Science. 10 (1): 19–35. doi: 10.1016/0304-3975(80)90069-9 . ISSN   0304-3975.
  3. Baader, Franz; Narendran, Paliath (2001). "Unification of Concept Terms in Description Logics". Journal of Symbolic Computation. 31 (3): 277–305. doi: 10.1006/jsco.2000.0426 . ISSN   0747-7171.
  4. Conway, John Horton (1971). Regular Algebra and Finite Machines. Chapman and Hall. ISBN   978-0-486-48583-6.
  5. Karhumäki, Juhani; Petre, Ion (2002). "Conway's problem for three-word sets". Theoretical Computer Science. 289 (1): 705–725. doi: 10.1016/S0304-3975(01)00389-9 . ISSN   0304-3975.
  6. Karhumäki, Juhani; Petre, Ion (2002). The Branching Point Approach to Conway's Problem. Lecture Notes in Computer Science. Vol. 2300. pp. 69–76. doi:10.1007/3-540-45711-9_5. ISBN   978-3-540-43190-9. ISSN   0302-9743.
  7. Kunc, Michal (2007). "The Power of Commuting with Finite Sets of Words". Theory of Computing Systems. 40 (4): 521–551. doi:10.1007/s00224-006-1321-z. ISSN   1432-4350. S2CID   13406797.
  8. Kunc, Michal (2005). "Regular solutions of language inequalities and well quasi-orders". Theoretical Computer Science. 348 (2–3): 277–293. doi: 10.1016/j.tcs.2005.09.018 . ISSN   0304-3975.
  9. Parikh, Rohit; Chandra, Ashok; Halpern, Joe; Meyer, Albert (1985). "Equations between Regular Terms and an Application to Process Logic". SIAM Journal on Computing. 14 (4): 935–942. doi:10.1137/0214066. ISSN   0097-5397.
  10. Okhotin, Alexander (2010). "Decision problems for language equations". Journal of Computer and System Sciences. 76 (3–4): 251–266. doi: 10.1016/j.jcss.2009.08.002 . ISSN   0022-0000.
  11. Leiss, E.L. (1994). "Unrestricted complementation in language equations over a one-letter alphabet". Theoretical Computer Science. 132 (1–2): 71–84. doi: 10.1016/0304-3975(94)90227-5 . ISSN   0304-3975.
  12. Jeż, Artur (2008). "Conjunctive grammars generate non-regular unary languages". International Journal of Foundations of Computer Science. 19 (3): 597–615. doi:10.1142/S012905410800584X. ISSN   0129-0541.
  13. Jeż, Artur; Okhotin, Alexander (2014). "Computational completeness of equations over sets of natural numbers". Information and Computation. 237: 56–94. CiteSeerX   10.1.1.395.2250 . doi:10.1016/j.ic.2014.05.001. ISSN   0890-5401.