Automata-based programming

Last updated

Automata-based programming is a programming paradigm in which the program or part of it is thought of as a model of a finite-state machine (FSM) or any other (often more complicated) formal automaton (see automata theory). Sometimes a potentially infinite set of possible states is introduced, and such a set can have a complicated structure, not just an enumeration.

Contents

Finite-state machine-based programming is generally the same, but, formally speaking, does not cover all possible variants, as FSM stands for finite-state machine, and automata-based programming does not necessarily employ FSMs in the strict sense.

The following properties are key indicators for automata-based programming:

The whole execution of the automata-based code is a cycle of the automaton steps.

Another reason for using the notion of automata-based programming is that the programmer's style of thinking about the program in this technique is very similar to the style of thinking used to solve mathematical tasks using Turing machines, Markov algorithms, etc.

Example

Task

Consider the task of reading a text from standard input line-by-line and writing the first word of each line to standard output. First we skip all leading whitespace characters, if any. Then we print all the characters of the first word. Finally we skip all the trailing characters until a newline character is encountered. Whenever a sequence of newline characters is encountered not at the beginning of the stream, we print only the first one and skip the remaining ones; else, we skip all. Next we restart the process at the following line. Upon encountering the end-of-file condition (regardless of the stage), we stop.

Traditional program

A traditional program in C which performs the above task could look like this:

#include<ctype.h>#include<stdio.h>intmain(void){intc;do{do{c=getchar();}while(isspace(c));while(!isspace(c)&&c!=EOF){putchar(c);c=getchar();}while(c!='\n'&&c!=EOF){c=getchar();}if(c=='\n'){putchar(c);}}while(c!=EOF);return0;}

For instance, compiling and running the above program on this input:

$ clangprogram.c&&(printf"\t\v\f\r \n\n\t\v\f\r foo bar baz\n\n\t\v\f\r qux quux corge"|./a.out)

yields:

fooqux

Automata-based program

Procedural

The same task can be solved by thinking in terms of finite-state machines. The parsing of a line has three stages: skipping the leading whitespace characters, printing the characters of the first word and skipping the trailing characters. Let's call these automaton states BEFORE, INSIDE and AFTER. An automata-based version of the program could look like this:

#include<ctype.h>#include<stdio.h>enumState{BEFORE,INSIDE,AFTER};intmain(void){intc;enumStates=BEFORE;while((c=getchar())!=EOF){switch(s){caseBEFORE:if(!isspace(c)){putchar(c);s=INSIDE;}break;caseINSIDE:if(c=='\n'){putchar(c);s=BEFORE;}elseif(isspace(c)){s=AFTER;}else{putchar(c);}break;caseAFTER:if(c=='\n'){putchar(c);s=BEFORE;}break;}}return0;}

Although the program now looks longer, it has at least one significant advantage: there is only one reading (that is, call to the getchar function) instruction. Besides that, there is only one loop instead of the four the traditional version had. The body of the while loop is the automaton step and the loop itself is the cycle of the automaton step. The program implements the work of a finite-state machine shown in the state diagram.

The most important property of the program is that the automaton step code section is clearly localized. With an explicit function step for the automation step, the program better demonstrates this property:

#include<ctype.h>#include<stdio.h>enumState{BEFORE,INSIDE,AFTER};voidstep(enumState*consts,intconstc){switch(*s){caseBEFORE:if(!isspace(c)){putchar(c);*s=INSIDE;}break;caseINSIDE:if(c=='\n'){putchar(c);*s=BEFORE;}elseif(isspace(c)){*s=AFTER;}else{putchar(c);}break;caseAFTER:if(c=='\n'){putchar(c);*s=BEFORE;}break;}}intmain(void){intc;enumStates=BEFORE;while((c=getchar())!=EOF){step(&s,c);}return0;}

The program now clearly demonstrates the basic properties of automata-based code:

  • time periods of automaton step executions may not overlap;
  • the only information passed from the previous step to the next is the explicitly specified automaton state.

A finite automaton can be defined by a state-transition table whose rows stand for the current states, columns stand for the inputs, and cells stand for the next states and actions to perform.

State-transition table
Input
Current state
newlinewhitespaceother
beforebeforebeforeinside/print
insidebefore/printafterinside/print
afterbefore/printafterafter
State diagram
The state diagram of a finite-state machine that prints the first word of each line of an input stream. The machine follows exactly one transition on each step, depending on the current state and the encountered character. The action accompanying a transition is either a no-operation or the printing of the encountered character, denoted with print. Finite-state machine state-diagram.png
The state diagram of a finite-state machine that prints the first word of each line of an input stream. The machine follows exactly one transition on each step, depending on the current state and the encountered character. The action accompanying a transition is either a no-operation or the printing of the encountered character, denoted with print.

Generally speaking, an automata-based program can naturally use this approach. With an explicit two-dimensional array transitions for the state-transition table, the program uses this approach:

#include<ctype.h>#include<stdio.h>enumState{BEFORE,INSIDE,AFTER};voidnop(intconstc){}voidprint(intconstc){putchar(c);}structBranch{enumStateconstnext_state;void(*action)(int);};structBranchconsttransitions[3][3]={//   newline         whitespace         other             Inputs/States{{BEFORE,&nop},{BEFORE,&nop},{INSIDE,&print}},// before{{BEFORE,&print},{AFTER,&nop},{INSIDE,&print}},// inside{{BEFORE,&print},{AFTER,&nop},{AFTER,&nop}}// after};voidstep(enumState*consts,intconstc){intconstrow=(*s==BEFORE)?0:(*s==INSIDE)?1:2;intconstcolumn=(c=='\n')?0:isspace(c)?1:2;structBranchconst*constb=&transitions[row][column];*s=b->next_state;b->action(c);}intmain(void){intc;enumStates=BEFORE;while((c=getchar())!=EOF){step(&s,c);}return0;}

Object-oriented

If the implementation language supports object-oriented programming, a simple refactoring of the program is to encapsulate the automaton into an object, thus hiding its implementation details. The program in C++ using object-oriented style could look like this:

#include<ctype.h>#include<stdio.h>enumState{BEFORE,INSIDE,AFTER};structBranch{enumStateconstnext_state;void(*action)(int);};classStateMachine{public:StateMachine();voidfeedChar(int);protected:staticvoidnop(int);staticvoidprint(int);private:enumState_state;staticstructBranchconst_transitions[3][3];};StateMachine::StateMachine():_state(BEFORE){}voidStateMachine::feedChar(intconstc){intconstrow=(_state==BEFORE)?0:(_state==INSIDE)?1:2;intconstcolumn=(c=='\n')?0:isspace(c)?1:2;structBranchconst*constb=&_transitions[row][column];_state=b->next_state;b->action(c);}voidStateMachine::nop(intconstc){}voidStateMachine::print(intconstc){putchar(c);}structBranchconstStateMachine::_transitions[3][3]={//   newline         whitespace         other             Inputs/States{{BEFORE,&nop},{BEFORE,&nop},{INSIDE,&print}},// before{{BEFORE,&print},{AFTER,&nop},{INSIDE,&print}},// inside{{BEFORE,&print},{AFTER,&nop},{AFTER,&nop}}// after};intmain(){intc;StateMachinem;while((c=getchar())!=EOF){m.feedChar(c);}return0;}

To minimize changes not directly related to the subject of the article, the input/output getchar and putchar functions from the standard library of C are being used.

The state design pattern is a way for an object to change its behavior at runtime according to its internal state without resorting to large conditional statements or table lookups thanks to virtual function calls. Its main advantage over code using large conditional statements is that state-specific code is distributed across different objects rather than localized in a monolithic block, which improves maintainability. Its main advantages over code using state-transition tables are that virtual function calls are often more efficient than table lookups, that state-transition criteria are more explicit than in tabular format, and that it is easier to add actions accompanying state transitions. However it introduces a new problem: the number of classes makes the code less compact than the other approaches. The program using the state design pattern could look like this:

#include<ctype.h>#include<stdio.h>classStateMachine;classState{public:virtualvoidfeedChar(StateMachine*,int)const=0;};classBefore:publicState{public:staticStateconst*instantiate();virtualvoidfeedChar(StateMachine*,int)constoverride;protected:Before()=default;private:staticStateconst*_instance;};classInside:publicState{public:staticStateconst*instantiate();virtualvoidfeedChar(StateMachine*,int)constoverride;protected:Inside()=default;private:staticStateconst*_instance;};classAfter:publicState{public:staticStateconst*instantiate();virtualvoidfeedChar(StateMachine*,int)constoverride;protected:After()=default;private:staticStateconst*_instance;};classStateMachine{public:StateMachine();voidfeedChar(int);protected:voidsetState(Stateconst*);private:Stateconst*_state;friendclassBefore;friendclassInside;friendclassAfter;};Stateconst*Before::instantiate(){if(!_instance){_instance=newBefore;}return_instance;}voidBefore::feedChar(StateMachine*constm,intconstc)const{if(!isspace(c)){putchar(c);m->setState(Inside::instantiate());}}Stateconst*Before::_instance=nullptr;Stateconst*Inside::instantiate(){if(!_instance){_instance=newInside;}return_instance;}voidInside::feedChar(StateMachine*constm,intconstc)const{if(c=='\n'){putchar(c);m->setState(Before::instantiate());}elseif(isspace(c)){m->setState(After::instantiate());}else{putchar(c);}}Stateconst*Inside::_instance=nullptr;Stateconst*After::instantiate(){if(!_instance){_instance=newAfter;}return_instance;}voidAfter::feedChar(StateMachine*constm,intconstc)const{if(c=='\n'){putchar(c);m->setState(Before::instantiate());}}Stateconst*After::_instance=nullptr;StateMachine::StateMachine():_state(Before::instantiate()){}voidStateMachine::feedChar(intconstc){_state->feedChar(this,c);}voidStateMachine::setState(Stateconst*consts){_state=s;}intmain(){intc;StateMachinem;while((c=getchar())!=EOF){m.feedChar(c);}return0;}

Automation and automata

Automata-based programming indeed closely matches the programming needs found in the field of automation.

A production cycle is commonly modelled as:

Various dedicated programming languages allow expressing such a model in more or less sophisticated ways.

Automation program

The example presented above could be expressed according to this view like in the following pseudo-code ('set' activates a logic variable, 'reset' inactivates a logic variable, ':' assigns a variable, and '=' tests for equality):

newline: '\n' whitespace: ('\t', '\n', '\v', '\f', '\r', ' ') states: (before, inside, after)   setState(c) {   if before and (c != newline and c not in whitespace) then set inside   if inside then (if c in whitespace then set after else if c = newline then set before)   if after and c = newline then set before }   doAction(c) {   if before and (c != newline and c not in whitespace) then write(c)   if inside and c not in whitespace then write(c)   if after and c = newline then write(c) }   cycle {   set before    loop until (c: readCharacter) = EOL {     setState(c)     doAction(c)   } } 

The separation of routines expressing cycle progression on one side, and actual action on the other (matching input and output) allows clearer and simpler code.

Events

In the field of automation, stepping from step to step depends on input data coming from the machine itself. This is represented in the program by reading characters from a text. In reality, those data inform about position, speed, temperature, etc. of critical elements of a machine.

Like in GUI programming, changes in the machine state can thus be considered as events causing the passage from a state to another, until the final one is reached. The combination of possible states can generate a wide variety of events, thus defining a more complex production cycle. As a consequence, cycles are usually far to be simple linear sequences. There are commonly parallel branches running together and alternatives selected according to different events, schematically represented below:

   s:stage   c:condition        s1    |    |-c2    |    s2    |    ----------    |        |    |-c31    |-c32    |        |   s31       s32    |        |    |-c41    |-c42    |        |    ----------    |    s4

Applications

Automata-based programming is widely used in lexical and syntactic analyses. [1]

Besides that, thinking in terms of automata (that is, breaking the execution process down to automaton steps and passing information from step to step through the explicit automaton state) is necessary for event-driven programming as the only alternative to using parallel processes or threads.

The notions of states and state machines are often used in the field of formal specification. For instance, UML-based software architecture development uses state diagrams to specify the behaviour of the program. Also various communication protocols are often specified using the explicit notion of state (e.g., RFC   793).

Thinking in terms of automata (steps and states) can also be used to describe semantics of some programming languages. For example, the execution of a program written in the Refal language is described as a sequence of steps of a so-called abstract Refal machine; the state of the machine is a view (an arbitrary Refal expression without variables).

Continuations in the Scheme language require thinking in terms of steps and states, although Scheme itself is in no way automata-related (it is recursive). To make it possible for the call/cc feature to work, implementation needs to be able to catch a whole state of the executing program, which is only possible when there is no implicit part in the state. Such a caught state is the very thing called continuation, and it can be considered as the state of a (relatively complicated) automaton. The automaton step is deducing the next continuation from the previous one, and the execution process is the cycle of such steps.

Alexander Ollongren in his book [2] explains the so-called Vienna method of programming languages semantics description which is fully based on formal automata.

The STAT system is a good example of using the automata-based approach; this system, besides other features, includes an embedded language called STATL which is purely automata-oriented.

History

Automata-based techniques were used widely in the domains where there are algorithms based on automata theory, such as formal language analyses. [1]

One of the early papers on this is by Johnson et al., 1968. [3]

One of the earliest mentions of automata-based programming as a general technique is found in the paper by Peter Naur, 1963. [4] The author calls the technique Turing machine approach, however no real Turing machine is given in the paper; instead, the technique based on steps and states is described.

Comparison with imperative and procedural programming

The notion of state is not exclusive property of automata-based programming. [5] Generally speaking, state (or program state) appears during execution of any computer program, as a combination of all information that can change during the execution. For instance, a state of a traditional imperative program consists of

These can be divided to the explicit part (such as values stored in variables) and the implicit part (return addresses and the instruction pointer).

Having said this, an automata-based program can be considered as a special case of an imperative program, in which implicit part of the state is minimized. The state of the whole program taken at the two distinct moments of entering the step code section can differ in the automaton state only. This simplifies the analysis of the program.

Object-oriented programming relationship

In the theory of object-oriented programming, an object is said to have an internal state and is capable of receiving messages, responding to them, sending messages to other objects and changing its internal state during message handling. In more practical terminology, to call an object's method is considered the same as to send a message to the object.

Thus, on the one hand, objects from object-oriented programming can be considered as automata (or models of automata) whose state is the combination of private fields, and one or more methods are considered to be the step. Such methods must not call each other nor themselves, neither directly nor indirectly, otherwise the object can not be considered to be implemented in an automata-based manner.

On the other hand, object is good for implementing a model of an automaton. When the automata-based approach is used within an object-oriented language, an automaton model is usually implemented by a class, the state is represented with private fields of the class, and the step is implemented as a method; such a method is usually the only non-constant public method of the class (besides constructors and destructors). Other public methods could query the state but do not change it. All the secondary methods (such as particular state handlers) are usually hidden within the private part of the class.

See also

Related Research Articles

Templates are a feature of the C++ programming language that allows functions and classes to operate with generic types. This allows a function or class declaration to reference via a generic variable another different class without creating full declaration for each of these different classes.

<span class="mw-page-title-main">Abstract factory pattern</span> Software design pattern

The abstract factory pattern in software engineering is a design pattern that provides a way to create families of related objects without imposing their concrete classes, by encapsulating a group of individual factories that have a common theme without specifying their concrete classes. According to this pattern, a client software component creates a concrete implementation of the abstract factory and then uses the generic interface of the factory to create the concrete objects that are part of the family. The client does not know which concrete objects it receives from each of these internal factories, as it uses only the generic interfaces of their products. This pattern separates the details of implementation of a set of objects from their general usage and relies on object composition, as object creation is implemented in methods exposed in the factory interface.

In computer programming, lazy initialization is the tactic of delaying the creation of an object, the calculation of a value, or some other expensive process until the first time it is needed. It is a kind of lazy evaluation that refers specifically to the instantiation of objects or other resources.

The prototype pattern is a creational design pattern in software development. It is used when the types of objects to create is determined by a prototypical instance, which is cloned to produce new objects. This pattern is used to avoid subclasses of an object creator in the client application, like the factory method pattern does, and to avoid the inherent cost of creating a new object in the standard way when it is prohibitively expensive for a given application.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Java syntax</span> Set of rules defining correctly structured program

The syntax of Java is the set of rules defining how a Java program is written and interpreted.

In computer programming, a function prototype or function interface is a declaration of a function that specifies the function's name and type signature, but omits the function body. While a function definition specifies how the function does what it does, a function prototype merely specifies its interface, i.e. what data types go in and come out of it. The term "function prototype" is particularly used in the context of the programming languages C and C++ where placing forward declarations of functions in header files allows for splitting a program into translation units, i.e. into parts that a compiler can separately translate into object files, to be combined by a linker into an executable or a library. The function declaration precedes the function definition, giving details of name, return type, and storage class along with other relevant attributes.

In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time on each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

A class in C++ is a user-defined type or data structure declared with any of the keywords class, struct or union that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class declared with the keyword class is private. The private members are not accessible outside the class; they can be accessed only through member functions of the class. The public members form an interface to the class and are accessible outside the class.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

stdarg.h is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type. C++ provides this functionality in the header cstdarg.

In computing, compile-time function execution is the ability of a compiler, that would normally compile a function to machine code and execute it at run time, to execute the function at compile time. This is possible if the arguments to the function are known at compile time, and the function does not make any reference to or attempt to modify any global state.

This article describes the syntax of the C# programming language. The features described are compatible with .NET Framework and Mono.

In computer programming, variadic templates are templates that take a variable number of arguments.

"typename" is a keyword in the C++ programming language used when writing templates. It is used for specifying that a dependent name in a template definition or declaration is a type. In the original C++ compilers before the first ISO standard was completed, the typename keyword was not part of the C++ language and Bjarne Stroustrup used the class keyword for template arguments instead. While typename is now the preferred keyword, older source code may still use the class keyword instead.

In software engineering, the module pattern is a design pattern used to implement the concept of software modules, defined by modular programming, in a programming language with incomplete direct support for the concept.

re2c is a free and open-source lexer generator for C, C++, Go, and Rust. It compiles declarative regular expression specifications to deterministic finite automata. Originally written by Peter Bumbulis and described in his paper, re2c was put in public domain and has been since maintained by volunteers. It is the lexer generator adopted by projects such as PHP, SpamAssassin, Ninja build system and others. Together with the Lemon parser generator, re2c is used in BRL-CAD. This combination is also used with STEPcode, an implementation of ISO 10303 standard.

References

  1. 1 2 Aho, Alfred V.; Ullman, Jeffrey D. (1973). The theory of parsing, translation and compiling. Vol. 1. Englewood Cliffs, N. J.: Prentice-Hall. ISBN   0-13-914564-8.
  2. Ollongren, Alexander (1974). Definition of programming languages by interpreting automata. London: Academic Press. ISBN   0-12-525750-3.
  3. Johnson, W. L.; Porter, J. H.; Ackley, S. I.; Ross, D. T. (1968). "Automatic generation of efficient lexical processors using finite state techniques". Comm ACM. 11 (12): 805–813. doi: 10.1145/364175.364185 . S2CID   17253809.
  4. Naur, Peter (September 1963). "The design of the GIER ALGOL compiler Part II". BIT Numerical Mathematics. 3 (3): 145–166. doi:10.1007/BF01939983. S2CID   189785656.
  5. "Automata-based programming" (PDF). Scientific and Technical Journal of Information Technologies, Mechanics and Optics (53). 2008.