Linear grammar

Last updated

In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.

Contents

A linear language is a language generated by some linear grammar.

Example

An example of a linear grammar is G with N = {S}, Σ = {a, b}, P with start symbol S and rules

S aSb
S ε

It generates the language .

Relationship with regular grammars

Two special types of linear grammars are the following:

Each of these can describe exactly the regular languages. A regular grammar is a grammar that is left-linear or right-linear.

Observe that by inserting new nonterminals, any linear grammar can be replaced by an equivalent one where some of the rules are left-linear and some are right-linear. For instance, the rules of G above can be replaced with

S aA
A Sb
S ε

However, the requirement that all rules be left-linear (or all rules be right-linear) leads to a strict decrease in the expressive power of linear grammars.

Expressive power

All regular languages are linear; conversely, an example of a linear, non-regular language is { anbn }. as explained above. All linear languages are context-free; conversely, an example of a context-free, non-linear language is the Dyck language of well-balanced bracket pairs. Hence, the regular languages are a proper subset of the linear languages, which in turn are a proper subset of the context-free languages.

While regular languages are deterministic, there exist linear languages that are nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the linear grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its letters first which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string. [1] This language is nondeterministic. Since nondeterministic context-free languages cannot be accepted in linear time, linear languages cannot be accepted in linear time in the general case. Furthermore, it is undecidable whether a given context-free language is a linear context-free language. [2]

A language is linear iff it can be generated by a one-turn pushdown automaton – a pushdown automaton that, once it starts popping, never pushes again.

Closure properties

Positive cases

Linear languages are closed under union. Construction is the same as the construction for the union of context-free languages. Let be two linear languages, then is constructed by a linear grammar with , and playing the role of the linear grammars for .

If L is a linear language and M is a regular language, then the intersection is again a linear language; in other words, the linear languages are closed under intersection with regular sets.

Linear languages are closed under homomorphism and inverse homomorphism. [3]

As a corollary, linear languages form a full trio. Full trios in general are language families that enjoy a couple of other desirable mathematical properties.

Negative cases

Linear languages are not closed under intersection. For example, let , then their intersection is not only not linear, but also not context-free. See pumping lemma for context-free languages.

As a corollary, linear languages are not closed under complement (as intersection can be constructed by de Morgan's laws out of union and complement).

Related Research Articles

The Chomsky hierarchy, in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described by Noam Chomsky in 1956.

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar. Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

<span class="mw-page-title-main">Pushdown automaton</span> Type of automaton

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In theoretical computer science and formal language theory, a regular grammar is a grammar that is right-regular or left-regular. While their exact definition varies from textbook to textbook, they all require that

In formal language theory, a context-free grammar is in Greibach normal form (GNF) if the right-hand sides of all production rules start with a terminal symbol, optionally followed by some variables. A non-strict form allows one exception to this format restriction for allowing the empty word to be a member of the described language. The normal form was established by Sheila Greibach and it bears her name.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree. Every non-empty context-free language admits an ambiguous grammar by introducing e.g. a duplicate rule. A language that only admits ambiguous grammars is called an inherently ambiguous language. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous grammars; there are non-deterministic unambiguous grammars, however.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

In automata theory, a deterministic pushdown automaton is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages.

In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meaning that they admit an unambiguous grammar. There are non-deterministic unambiguous CFLs, so DCFLs form a proper subset of unambiguous CFLs.

Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.

An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars (TAGs). It is similar to the context-free grammar-parsing pushdown automaton, but instead of using a plain stack to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a generative capacity between context-free and context-sensitive grammars, or a subset of mildly context-sensitive grammars. Embedded pushdown automata should not be confused with nested stack automata which have more computational power.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

<span class="mw-page-title-main">LL grammar</span> Type of a context-free grammar

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

References

  1. Hopcroft, John; Rajeev Motwani; Jeffrey Ullman (2001). Introduction to automata theory, languages, and computation 2nd edition . Addison-Wesley. pp. 249–253.
  2. Greibach, Sheila (October 1966). "The Unsolvability of the Recognition of Linear Context-Free Languages". Journal of the ACM. 13 (4): 582–587. doi: 10.1145/321356.321365 . S2CID   37003419.
  3. John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages, and Computation , Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN   0-201-02988-X., Ex. 11.1, pp. 282f