Thue (programming language)

Last updated

Thue ( /ˈt/ TOO-ay) is an esoteric programming language invented by John Colagioia in early 2000. It is a meta-language that can be used to define or recognize Type-0 languages from the Chomsky hierarchy. Because it is able to define languages of such complexity, it is also Turing-complete itself. Thue is based on a nondeterministic string rewriting system called semi-Thue grammar, which itself is named after the Norwegian mathematician Axel Thue. The author describes it as follows: "Thue represents one of the simplest possible ways to construe constraint-based programming. It is to the constraint-based paradigm what languages like OISC are to the imperative paradigm; in other words, it's a tar pit."


Production rules

A Thue program starts with a rulebase, which is a series of substitution rules, each of this form:

lhs ::= rhs

The rulebase terminates with a lone production symbol on a line:


The initial state is a series of symbols which follow the rulebase.

Thue consumes the initial symbols and substitutes the result of the rules for each of the initial state's symbols.

Thue terminates when lhs cannot be found in a resultant state.


Calling Thue

When invoked with 'd' (debug), print the state. When invoked with 'l' (left side), apply the rules left-to-right. When invoked with 'r' (right side), apply the rules right-to-left. The last 'l' or 'r' overrides the previous switches.

Sample programs

Here is the traditional "Hello World!" in Thue:

a::=~Hello World! ::= a

The following Thue program performs an increment of a binary number entered as the initial state surrounded by "_" characters, in this case the number 1111111111:

1_::=1++ 0_::=1  01++::=10 11++::=1++0  _0::=_ _1++::=10  __::=1  ::=  _1111111111_

The following sample program is to demonstrate Thue's nondeterminism (and to show an example of an infinite loop, besides). The program outputs bits in an undefined (and quite possibly random) sequence.

b::=~0 b::=~1 ac::=abc ::= abc

See also

Related Research Articles

Context-free grammar Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers, and GLR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

L-system Rewriting system and type of formal grammar

An L-system or Lindenmayer system is a parallel rewriting system and a type of formal grammar. An L-system consists of an alphabet of symbols that can be used to make strings, a collection of production rules that expand each symbol into some larger string of symbols, an initial "axiom" string from which to begin construction, and a mechanism for translating the generated strings into geometric structures. L-systems were introduced and developed in 1968 by Aristid Lindenmayer, a Hungarian theoretical biologist and botanist at the University of Utrecht. Lindenmayer used L-systems to describe the behaviour of plant cells and to model the growth processes of plant development. L-systems have also been used to model the morphology of a variety of organisms and can be used to generate self-similar fractals.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In computer science, Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. They are applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation.

In theoretical computer science, a Markov algorithm is a string rewriting system that uses grammar-like rules to operate on strings of symbols. Markov algorithms have been shown to be Turing-complete, which means that they are suitable as a general model of computation and can represent any mathematical expression from its simple notation. Markov algorithms are named after the Soviet mathematician Andrey Markov, Jr.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

In computer science, an operator precedence parser is a bottom-up parser that interprets an operator-precedence grammar. For example, most calculators use operator precedence parsers to convert from the human-readable infix notation relying on order of operations to a format that is optimized for evaluation such as Reverse Polish notation (RPN).

A "production system " is a computer program typically used to provide some form of artificial intelligence, which consists primarily of a set of rules about behavior but it also includes the mechanism necessary to follow those rules as the system responds to states of the world. Those rules, termed productions, are a basic representation found useful in automated planning, expert systems and action selection.

In automata theory, the class of unrestricted grammars is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted grammar, other than each of their left-hand sides being non-empty. This grammar class can generate arbitrary recursively enumerable languages.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

Shape grammars in computation are a specific class of production systems that generate geometric shapes. Typically, shapes are 2- or 3-dimensional, thus shape grammars are a way to study 2- and 3-dimensional languages. Shape grammars were first introduced in a seminal article by George Stiny and James Gips in 1971. The mathematical and algorithmic foundations of shape grammars were developed in "Pictorial and Formal Aspects of Shapes and Shape Grammars" by George Stiny. Applications of shape grammars were first considered in "Shape Grammars and their Uses" by James Gips. These publications also contain two independent, though equivalent, constructions showing that shape grammars can simulate Turing machines.

In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. Terminal symbols are the elementary symbols of the language defined by a formal grammar. Nonterminal symbols are replaced by groups of terminal symbols according to the production rules.

The Syntax Definition Formalism (SDF) is a metasyntax used to define context-free grammars: that is, a formal way to describe formal languages. It can express the entire range of context-free grammars. Its current version is SDF3. A parser and parser generator for SDF specifications are provided as part of the free ASF+SDF Meta Environment. These operate using the SGLR. An SDF parser outputs parse trees or, in the case of ambiguities, parse forests.

Refal "is a functional programming language oriented toward symbolic computations", including "string processing, language translation, [and] artificial intelligence". It is one of the oldest members of this family, first conceived of in 1966 as a theoretical tool, with the first implementation appearing in 1968. Refal was intended to combine mathematical simplicity with practicality for writing large and sophisticated programs.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

In computational linguistics, JAPE is the Java Annotation Patterns Engine, a component of the open-source General Architecture for Text Engineering (GATE) platform. JAPE is a finite state transducer that operates over annotations based on regular expressions. Thus it is useful for pattern-matching, semantic extraction, and many other operations over syntactic trees such as those produced by natural language parsers.

Charm is a computer programming language devised in the early 1990s with similarities to the RTL/2, Pascal and C languages in addition to containing some unique features of its own. The Charm language is defined by a context-free grammar amenable to being processed by recursive descent parser as described in seminal books on compiler design.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.