Markov algorithm

Last updated

In theoretical computer science, a Markov algorithm is a string rewriting system that uses grammar-like rules to operate on strings of symbols. Markov algorithms have been shown to be Turing-complete, which means that they are suitable as a general model of computation and can represent any mathematical expression from its simple notation. Markov algorithms are named after the Soviet mathematician Andrey Markov, Jr.

Contents

Refal is a programming language based on Markov algorithms.

Description

Normal algorithms are verbal, that is, intended to be applied to strings in different alphabets.

The definition of any normal algorithm consists of two parts: an alphabet, which is a set of symbols, and a scheme. The algorithm is applied to strings of symbols of the alphabet. The scheme is a finite ordered set of substitution formulas. Each formula can be either simple or final. Simple substitution formulas are represented by strings of the form , where and are two arbitrary strings in the alphabet. Similarly, final substitution formulas are represented by strings of the form .

Here is an example of a normal algorithm scheme in the five-letter alphabet :

The process of applying the normal algorithm to an arbitrary string in the alphabet of this algorithm is a discrete sequence of elementary steps, consisting of the following. Let’s assume that is the word obtained in the previous step of the algorithm (or the original word , if the current step is the first). If of the substitution formulas there is no left-hand side which is included in the , then the algorithm terminates, and the result of its work is considered to be the string . Otherwise, the first of the substitution formulae whose left sides are included in is selected. If the substitution formula is of the form , then out of all of possible representations of the string of the form (where and are arbitrary strings) the one with the shortest is chosen. Then the algorithm terminates and the result of its work is considered to be . However, if this substitution formula is of the form , then out of all of the possible representations of the string of the form of the one with the shortest is chosen, after which the string is considered to be the result of the current step, subject to further processing in the next step.

For example, the process of applying the algorithm described above to the word results in the sequence of words , , , , , , , , , and , after which the algorithm stops with the result .

For other examples, see below.

Any normal algorithm is equivalent to some Turing machine, and vice versa any Turing machine is equivalent to some normal algorithm. A version of the Church-Turing thesis formulated in relation to the normal algorithm is called the "principle of normalization."

Normal algorithms have proved to be a convenient means for the construction of many sections of constructive mathematics. Moreover, inherent in the definition of a normal algorithm are a number of ideas used in programming languages aimed at handling symbolic information for example, in Refal.

Algorithm

The Rules are a sequence of pairs of strings, usually presented in the form of patternreplacement. Each rule may be either ordinary or terminating.

Given an input string:

  1. Check the Rules in order from top to bottom to see whether any of the patterns can be found in the input string.
  2. If none is found, the algorithm stops.
  3. If one (or more) is found, use the first of them to replace the leftmost occurrence of matched text in the input string with its replacement.
  4. If the rule just applied was a terminating one, the algorithm stops.
  5. Go to step 1.

Note that after each rule application the search starts over from the first rule.

Example

The following example shows the basic operation of a Markov algorithm.

Rules

  1. "A" -> "apple"
  2. "B" -> "bag"
  3. "S" -> "shop"
  4. "T" -> "the"
  5. "the shop" -> "my brother"
  6. "a never used" -> ."terminating rule"

Symbol string

"I bought a B of As from T S."

Execution

If the algorithm is applied to the above example, the Symbol string will change in the following manner.

  1. "I bought a B of As from T S."
  2. "I bought a B of apples from T S."
  3. "I bought a bag of apples from T S."
  4. "I bought a bag of apples from T shop."
  5. "I bought a bag of apples from the shop."
  6. "I bought a bag of apples from my brother."

The algorithm will then terminate.

Another example

These rules give a more interesting example. They rewrite binary numbers to their unary counterparts. For example, 101 will be rewritten to a string of 5 consecutive bars.

Rules

  1. "|0" -> "0||"
  2. "1" -> "0|"
  3. "0" -> ""

Symbol string

"101"

Execution

If the algorithm is applied to the above example, it will terminate after the following steps.

  1. "101"
  2. "0|01"
  3. "00||1"
  4. "00||0|"
  5. "00|0|||"
  6. "000|||||"
  7. "00|||||"
  8. "0|||||"
  9. "|||||"

See also

Related Research Articles

<span class="mw-page-title-main">Kolmogorov complexity</span> Measure of algorithmic complexity

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

In computer science, the Cocke–Younger–Kasami algorithm is a parsing algorithm for context-free grammars published by Itiroo Sakai in 1961. The algorithm is named after some of its rediscoverers: John Cocke, Daniel Younger, Tadao Kasami, and Jacob T. Schwartz. It employs bottom-up parsing and dynamic programming.

In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings are to one another, that is measured by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

<span class="mw-page-title-main">Complexity class</span> Set of problems in computational complexity theory

In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory.

<span class="mw-page-title-main">Deterministic finite automaton</span> Finite-state machine

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

Thue is an esoteric programming language invented by John Colagioia in early 2000. It is a meta-language that can be used to define or recognize Type-0 languages from the Chomsky hierarchy. Because it is able to define languages of such complexity, it is also Turing-complete itself. Thue is based on a nondeterministic string rewriting system called semi-Thue grammar, which itself is named after the Norwegian mathematician Axel Thue. The author describes it as follows: "Thue represents one of the simplest possible ways to construe constraint-based programming. It is to the constraint-based paradigm what languages like OISC are to the imperative paradigm; in other words, it's a tar pit."

In propositional logic, a propositional formula is a type of syntactic formula which is well formed and has a truth value. If the values of all variables in a propositional formula are given, it determines a unique truth value. A propositional formula may also be called a propositional expression, a sentence, or a sentential formula.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

<span class="mw-page-title-main">Generalized suffix tree</span>

In computer science, a generalized suffix tree is a suffix tree for a set of strings. Given the set of strings of total length , it is a Patricia tree containing all suffixes of the strings. It is mostly used in bioinformatics.

In computational mathematics, a word problem is the problem of deciding whether two given expressions are equivalent with respect to a set of rewriting identities. A prototypical example is the word problem for groups, but there are many other instances as well. A deep result of computational theory is that answering this question is in many important cases undecidable.

In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and depending on its purpose maybe be finite, countable, or even uncountable.

In mathematical logic, a tautology is a formula or assertion that is true in every possible interpretation. An example is "x=y or x≠y". Similarly, "either the ball is green, or the ball is not green" is always true, regardless of the colour of the ball.

An interpretation is an assignment of meaning to the symbols of a formal language. Many formal languages used in mathematics, logic, and theoretical computer science are defined in solely syntactic terms, and as such do not have any meaning until they are given some interpretation. The general study of interpretations of formal languages is called formal semantics.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

In mathematics, logic and computer science, a formal language is called recursive if it is a recursive subset of the set of all possible finite sequences over the alphabet of the language. Equivalently, a formal language is recursive if there exists a Turing machine that, when given a finite sequence of symbols as input, always halts and accepts it if it belongs to the language and halts and rejects it otherwise. In Theoretical computer science, such always-halting Turing machines are called total Turing machines or algorithms. Recursive languages are also called decidable.

<span class="mw-page-title-main">Wavelet Tree</span>

The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes the and operations defined on bitvectors to arbitrary alphabets.

References

  1. Kushner, Boris A. (1999-05-28). "Markov's constructive analysis; a participant's view". Theoretical Computer Science. 219 (1–2): 268, 284. doi: 10.1016/S0304-3975(98)00291-6 .