Thue (programming language)

Last updated

Thue ( /ˈt/ TOO-ay) is an esoteric programming language invented by John Colagioia in early 2000. It is a meta-language that can be used to define or recognize Type-0 languages from the Chomsky hierarchy. Because it is able to define languages of such complexity, it is also Turing-complete itself. Thue is based on a nondeterministic string rewriting system called a semi-Thue grammar, named after the Norwegian mathematician Axel Thue. Colagioia describes the language as follows: "Thue represents one of the simplest possible ways to construe constraint-based programming. It is to the constraint-based paradigm what languages like OISC are to the imperative paradigm; in other words, it's a tar pit."

Contents

Production rules

A Thue program starts with a rulebase, which is a series of substitution rules, each of this form:

lhs ::= rhs

The rulebase terminates with a lone production symbol on a line:

::=

The initial state is a series of symbols which follow the rulebase.

Thue consumes the initial symbols and substitutes the result of the rules for each of the initial state's symbols.

Thue terminates when lhs cannot be found in a resultant state.

Notes

Calling Thue

When invoked with 'd' (debug), print the state. When invoked with 'l' (left side), apply the rules left-to-right. When invoked with 'r' (right side), apply the rules right-to-left. The last 'l' or 'r' overrides the previous switches.

Sample programs

Here is the traditional "Hello World!" in Thue:

a::=~Hello World! ::= a

The following Thue program performs an increment of a binary number entered as the initial state surrounded by "_" characters, in this case the number 1111111111:

1_::=1++ 0_::=1  01++::=10 11++::=1++0  _0::=_ _1++::=10  __::=1  ::=  _1111111111_

The following sample program is to demonstrate Thue's nondeterminism (and to show an example of an infinite loop, besides). The program outputs bits in an undefined (and quite possibly random) sequence.

b::=~0 b::=~1 ac::=abc ::= abc

See also


Related Research Articles

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, canonical LR(1) parsers, minimal LR(1) parsers, and generalized LR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

<span class="mw-page-title-main">L-system</span> Rewriting system and type of formal grammar

An L-system or Lindenmayer system is a parallel rewriting system and a type of formal grammar. An L-system consists of an alphabet of symbols that can be used to make strings, a collection of production rules that expand each symbol into some larger string of symbols, an initial "axiom" string from which to begin construction, and a mechanism for translating the generated strings into geometric structures. L-systems were introduced and developed in 1968 by Aristid Lindenmayer, a Hungarian theoretical biologist and botanist at the University of Utrecht. Lindenmayer used L-systems to describe the behaviour of plant cells and to model the growth processes of plant development. L-systems have also been used to model the morphology of a variety of organisms and can be used to generate self-similar fractals.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In computer science, Backus–Naur form is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.

In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation.

In theoretical computer science, a Markov algorithm is a string rewriting system that uses grammar-like rules to operate on strings of symbols. Markov algorithms have been shown to be Turing-complete, which means that they are suitable as a general model of computation and can represent any mathematical expression from its simple notation. Markov algorithms are named after the Soviet mathematician Andrey Markov, Jr.

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

In computer science, an operator precedence parser is a bottom-up parser that interprets an operator-precedence grammar. For example, most calculators use operator precedence parsers to convert from the human-readable infix notation relying on order of operations to a format that is optimized for evaluation such as Reverse Polish notation (RPN).

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

A "production system" is a computer program typically used to provide some form of artificial intelligence, which consists primarily of a set of rules about behavior, but it also includes the mechanism necessary to follow those rules as the system responds to states of the world. Those rules, termed productions, are a basic representation found useful in automated planning, expert systems and action selection.

In automata theory, the class of unrestricted grammars is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted grammar, other than each of their left-hand sides being non-empty. This grammar class can generate arbitrary recursively enumerable languages.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

Shape grammars in computation are a specific class of production systems that generate geometric shapes. Typically, shapes are 2- or 3-dimensional, thus shape grammars are a way to study 2- and 3-dimensional languages. Shape grammars were first introduced in a seminal article by George Stiny and James Gips in 1971. The mathematical and algorithmic foundations of shape grammars were developed in "Pictorial and Formal Aspects of Shapes and Shape Grammars" by George Stiny. Applications of shape grammars were first considered in "Shape Grammars and their Uses" by James Gips. These publications also contain two independent, though equivalent, constructions showing that shape grammars can simulate Turing machines.

<span class="mw-page-title-main">Terminal and nonterminal symbols</span> Categories of symbols in formal grammars

In formal languages, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. Terminal symbols are the elementary symbols of the language defined as part of a formal grammar. Nonterminal symbols are replaced by groups of terminal symbols according to the production rules.

The Syntax Definition Formalism (SDF) is a metasyntax used to define context-free grammars: that is, a formal way to describe formal languages. It can express the entire range of context-free grammars. Its current version is SDF3. A parser and parser generator for SDF specifications are provided as part of the free ASF+SDF Meta Environment. These operate using the SGLR. An SDF parser outputs parse trees or, in the case of ambiguities, parse forests.

Refal "is a functional programming language oriented toward symbolic computations", including "string processing, language translation, [and] artificial intelligence". It is one of the oldest members of this family, first conceived of in 1966 as a theoretical tool, with the first implementation appearing in 1968. Refal was intended to combine mathematical simplicity with practicality for writing large and sophisticated programs.

<span class="mw-page-title-main">Formal grammar</span> Structure of a formal language

A formal grammar describes which strings from an alphabet of a formal language are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

Charm is a computer programming language devised in the early 1990s with similarities to the RTL/2, Pascal and C languages in addition to containing some unique features of its own. The Charm language is defined by a context-free grammar amenable to being processed by recursive descent parser as described in seminal books on compiler design.