Post canonical system

Last updated

A Post canonical system, as created by Emil Post, is a string-manipulation system that starts with finitely-many strings and repeatedly transforms them by applying a finite set j of specified rules of a certain form, thus generating a formal language. Today they are mainly of historical relevance because every Post canonical system can be reduced to a string rewriting system (semi-Thue system), which is a simpler formulation. Both formalisms are Turing complete.

Contents

Definition

A Post canonical system is a triplet (A,I,R), where

where each g and h is a specified fixed word, and each $ and $' is a variable standing for an arbitrary word. The strings before and after the arrow in a production rule are called the rule's antecedents and consequent, respectively. It is required that each $' in the consequent be one of the $s in the antecedents of that rule, and that each antecedent and consequent contain at least one variable.

In many contexts, each production rule has only one antecedent, thus taking the simpler form

The formal language generated by a Post canonical system is the set whose elements are the initial words together with all words obtainable from them by repeated application of the production rules. Such sets are recursively enumerable languages and every recursively enumerable language is the restriction of some such set to a sub-alphabet of A.

Example (well-formed bracket expressions)

Alphabet: {[, ]} Initial word: [] Production rules:  (1)       $ [$] (2)       $$$ (3)       $1$2$1[]$2  Derivation of a few words in the language of well-formed bracket expressions:         []             initial word        [][]           by (2)        [[][]]         by (1)        [[][]][[][]]   by (2)        [[][]][][[][]] by (3)        ...

Normal-form theorem

A Post canonical system is said to be in normal form if it has only one initial word and every production rule is of the simple form

Post 1943 proved the remarkable Normal-form Theorem, which applies to the most-general type of Post canonical system:

Given any Post canonical system on an alphabet A, a Post canonical system in normal form can be constructed from it, possibly enlarging the alphabet, such that the set of words involving only letters of A that are generated by the normal-form system is exactly the set of words generated by the original system.

Tag systems, which comprise a universal computational model, are notable examples of Post normal-form system, being also monogenic. (A canonical system is said to be monogenic if, given any string, at most one new string can be produced from it in one step i.e., the system is deterministic.)

String rewriting systems, type-0 formal grammars

A string rewriting system is a special type of Post canonical system with a single initial word, and the productions are each of the form

That is, each production rule is a simple substitution rule, often written in the form gh. It has been proved that any Post canonical system is reducible to such a substitution system, which, as a formal grammar, is also called a phrase-structure grammar, or a type-0 grammar in the Chomsky hierarchy.

Related Research Articles

Context-free grammar Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It allows characterizing some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible, and the linear map represented by the matrix is an isomorphism. The determinant of a product of matrices is the product of their determinants . The determinant of a matrix A is denoted det(A), det A, or |A|.

Formal language Words whose letters are taken from an alphabet and are well-formed according to a specific set of rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In computer science, an LL parser is a top-down parser for a subset of context-free languages. It parses the input from Left to right, performing Leftmost derivation of the sentence.

The Post correspondence problem is an undecidable decision problem that was introduced by Emil Post in 1946. Because it is simpler than the halting problem and the Entscheidungsproblem it is often used in proofs of undecidability.

In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, substitution matrix, or Markov matrix. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. There are several different definitions and types of stochastic matrices:

A tag system is a deterministic computational model published by Emil Leon Post in 1943 as a simple form of a Post canonical system. A tag system may also be viewed as an abstract machine, called a Post tag machine —briefly, a finite-state machine whose only tape is a FIFO queue of unbounded length, such that in each transition the machine reads the symbol at the head of the queue, deletes a constant number of symbols from the head, and appends to the tail a symbol-string that depends solely on the first symbol read in this transition.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. The objects of focus for this article include rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

Deterministic finite automaton

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by taking the cosine of the angle between the two vectors formed by any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

In linear algebra, the Gram matrix of a set of vectors in an inner product space is the Hermitian matrix of inner products, whose entries are given by . If the vectors are real and the columns of matrix , then the Gram matrix is .

Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do the job of the function, i.e. given an input of the function domain it can return the corresponding output. Computable functions are used to discuss computability without referring to any concrete model of computation such as Turing machines or register machines. Any definition, however, must make reference to some specific model of computation but all valid definitions yield the same class of functions. Particular models of computability that give rise to the set of computable functions are the Turing-computable functions and the general recursive functions.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

A production or production rule in computer science is a rewrite rule specifying a symbol substitution that can be recursively performed to generate new symbol sequences. A finite set of productions is the main component in the specification of a formal grammar. The other components are a finite set of nonterminal symbols, a finite set of terminal symbols that is disjoint from and a distinguished symbol that is the start symbol.

In mathematics, an approximately finite-dimensional (AF) C*-algebra is a C*-algebra that is the inductive limit of a sequence of finite-dimensional C*-algebras. Approximate finite-dimensionality was first defined and described combinatorially by Ola Bratteli. Later, George A. Elliott gave a complete classification of AF algebras using the K0 functor whose range consists of ordered abelian groups with sufficiently nice order structure.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for strings in a formal language.

Controlled grammars are a class of grammars that extend, usually, the context-free grammars with additional controls on the derivations of a sentence in the language. A number of different kinds of controlled grammars exist, the four main divisions being Indexed grammars, grammars with prescribed derivation sequences, grammars with contextual conditions on rule application, and grammars with parallelism in rule application. Because indexed grammars are so well established in the field, this article will address only the latter three kinds of controlled grammars.

In mathematics, a quasitoric manifold is a topological analogue of the nonsingular projective toric variety of algebraic geometry. A smooth -dimensional manifold is a quasitoric manifold if it admits a smooth, locally standard action of an -dimensional torus, with orbit space an -dimensional simple convex polytope.

Suffix automaton Deterministic finite automaton accepting set of all suffixes of particular string

In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. The suffix automaton of a string is the smallest directed acyclic graph with a dedicated initial vertex and a set of "final" vertices, such that paths from the initial vertex to final vertices represent the suffixes of the string. Formally speaking, a suffix automaton is defined by the following set of properties:

  1. Its arcs are tagged with letters;
  2. none of its nodes have two outgoing arcs tagged with the same letter;
  3. for every suffix of there exists a path from initial vertex to some final vertex such that the concatenation of letters on the path forms this suffix;
  4. it has the fewest vertices among all graphs defined by the properties above.

References