This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Paradigm | Dataflow |
---|---|
First appeared | 2001 |
Platform | Platform-independent |
Filename extensions | .cal, .xdf |
Major implementations | |
Open RVC-CAL Compiler, OpenDF framework |
CAL (the Cal Actor Language) is a high-level programming language [1] for writing (dataflow) actors, which are stateful operators that transform input streams of data objects (tokens) into output streams. CAL has been compiled to a variety of target platforms, including single-core processors, multicore processors, and programmable hardware. It has been used in several application areas, including video and processing, compression and cryptography. The MPEG Reconfigurable Video Coding (RVC) [2] working group has adopted CAL as part of their standardization efforts.
The CAL Actor Language was developed in 2001 as part of the Ptolemy II project at University of California at Berkeley. CAL is a dataflow language geared towards a variety of application domains, such as multimedia processing, control systems, network processing etc.
Another common reason for choosing dataflow is that the goal is an efficient parallel implementation which would be difficult or impossible to achieve using a sequential programming language. Sequential languages are notoriously difficult to parallelize in general, so efficient parallel implementations will usually require significant guidance from the user. A CAL dataflow program provides simple, understandable, and powerful abstractions that allow the specification of as much or as little parallelism as is required, enabling tools to produce sophisticated implementations that exploit the concurrent structure of a computation.
When programming in dataflow, the programmer is typically constructing a concurrent description of a computational system, which is different from a common sequential program. Rather than being concerned with the step-by-step execution of an algorithm, a dataflow programmer builds a system of asynchronously communicating entities called actors. Much of the programming effort is directed toward finding a good factoring of the problem into actors, and toward engineering appropriate communication patterns among those actors.
Actors perform their computation in a sequence of steps called firings. In each of those steps:
Consequently, describing an actor involves describing its interface to the outside, the ports, the structure of its internal state, as well as the steps it can perform, what these steps do (in terms of token production and consumption, and the update of the actor state), and how to pick the step that the actor will perform next. This section discusses some of the constructs in the CAL language that deal with these issues. Actions describe the things that happen during a step that an actor takes. In fact, it is accurate to say that a step consists of executing an action. Recall that when an actor takes a step, it may consume input tokens and produce output tokens.
Therefore, input patterns do the following:
The output side of an action is a little simpler, the output expressions simply define the number and values of the output tokens that will be produced on each output port by each firing of the action. It is permissible to omit the explicit naming of the port that an input pattern or output expression applies to if an action provides as many input patterns as there are input ports, or output expressions as there are output ports. In such a case, the patterns or expressions are matched by position against the port declarations.
One way of thinking about an actor is as an operator on streams of data — sequences of tokens enter it on its input ports, and sequences of tokens leave it on its output ports. When discussing the operation of an actor, it is often useful to look at it as an operator on streams. Actors can have parameters. They act as constants during the actor execution, and are given a concrete value when an actor is instantiated as part of an actor network. The main purpose of actor parameters is to allow programmers to specify families of related actors, without having to duplicate a lot of code.
A non-deterministic actor is one that, for the same input sequences, allows more than one run and more than one possible output. Non-determinism can be very powerful when used appropriately, but it can also be a very troublesome source of errors. A particular concern is that non-determinism might be introduced into an actor inadvertently, i.e. the author thinks the actor is deterministic even though it isn't. One of the key design goals of the CAL language was to allow the description of non-deterministic actors, while at the same time permitting tools to identify possible sources of non-determinism, so that they can warn the user about them.
A key consequence of a non-deterministic actor like NDMerge is that during an actual execution, its output may depend on the timing of its input. If both its input queues are empty, and NDMerge is waiting for input, then whatever input the next token arrives at may be the one that is copied next to the output. Consequently, the scheduling of activities in the actor network, or the relative speeds of the actors feeding into an actor like NDMerge may affect the output of the system. This may, occasionally, by desirable, and at other times it may not. In any event, it is a property that one needs to be aware of.
One way to look at non-determinism of the kind that makes an actor dependent on the precise timing of token arrivals is that such an actor only appears to be non-deterministic if we look at it as an operator on streams, because that view abstracts from the temporal properties of the execution, and thus purposefully removes information that is used to determine the sequence in which actions fire. From the perspective of the CAL language, this is not entirely accurate, but even so, it is easy to write non-deterministic actors that would not be deterministic even if we knew everything about the timing of the tokens and the actor implementation—such as the following:
The guard clause of an action contains a number of expressions that all need to be true in order for the action to be fireable. For the first action to be fireable, the incoming token needs to be greater or equal to zero, in which case it will be sent to output P. Otherwise that action cannot fire. Conversely, for the second action to be fireable, the token needs to be less than zero, in which case it is sent to output N. A run of this actor might look like this: An actor could run into trouble if it ever encounters a zero token, because none of its actions will be able to fire on it.
It's not illegal to write actors that terminate on some input, and in fact it may be important to have a few of those in some systems. But it is a pitfall that one needs to be aware of. Secondly, the guard conditions are also disjoint in addition to being exhaustive.
Finally, note that guard conditions can ”peek” at the incoming tokens without actually consuming them — if the guards happen to be false or the action is not fired for some other reason, and if the token is not consumed by another action, then it remains where it is, and will be available for the next firing. (Or it will remain there forever, as in the case of the zero token in front of SplitDead, which is never removed because the actor is dead.)
The Select actor below is another example of the use of guarded actions. It is similar to the NDMerge actor in the sense that it merges two streams (the ones arriving at its A and B input ports). However, it does so according to the (Boolean) values of the tokens arriving at its S input port.
In all the actors so far, nothing an action firing did would in any way affect subsequent firings of actions of the same actor. Using state variables, action firings can leave information behind for subsequent firings of either the same or a different action of the same actor. The way this actor is written, the selection of the next input token and the actual copying of the token to the output is one atomic step.
Note that Select and IterSelect are almost, but not entirely, equivalent. First of all, IterSelect makes twice as many steps in order to process the same number of tokens. Secondly, it actually reads, and therefore consumes, the S input token, irrespective of whether a matching data token is available on A or B.
The IterSelect actor of the previous section illustrated the use of state to control the selection of actions. This is an extremely common thing to do in practice, and the CAL language provides special syntax for this purpose in the form of schedules. Conceptually, one can think of schedules as codifying a particular pattern of using a state variable—they do not add anything to the language in terms of expressiveness. The rationale for using schedules is twofold:
Each state transition consists of three parts: the original state, a list of action tags, and the following state. One thing worth noting is that the number of actions has increased—instead of the original three, the new version with the schedule now has four actions. The reason is that an action can no longer directly assign the successor state, as it did in the original, where depending on the value of the token read state would be assigned either the value 1 or 2. In the version with a schedule, that state modification is implicit in the structure of the state machine, and it happens depending on which action fires. Accordingly, the condition that checks the value of the token has moved from within the body of the action to the guards of the two actions tagged readT and readF.
As long as it has only input on one of its input ports, everything is unambiguous. But, just like NDMerge, as soon as input is available on both input ports, it could fire either of its two actions, and there is nothing in that actor specification which would predispose it to choose one over the other.
None of the language constructs so far would allow us to do this. Unlike in this case of schedules, which could be regarded syntactic sugar because they could be reduced to existing elements of the language (state variables, guards, and assignments), this situation does in fact require a true extension—action priorities. The basic idea is to add a number of inequalities that relate actions with respect to their firing precedence.
Just as in the case of schedules, we use action tags to identify actions that we want to refer to later on—this time within the priority inequality. The priority block contains only one such inequality, relating the action tagged config to the one tagged process, giving the former priority over the latter. Of course, even this version is still very much timing-dependent. In this case, that need not be a problem, and in fact is probably a requirement for this actor to perform its function. But in general, it is important to understand that priorities, especially when used as in the previous example, need to be well- understood to yield the correct results. Especially when information about the timing of the communication within the network is vague, it is probably best to think of them as strong implementation directives.
The previous chapter focused primarily on those constructs in CAL that are related to actor-specific concepts—token input and output, actions, controlling the action selection and so forth. This section discusses the more ”pedestrian” parts of CAL, the statements and expressions used to manipulate data objects and express (sequential) algorithms. This part of the language is similar to what can be found in many procedural programming languages (such as C, Pascal, Java, Ada), so we will focus on areas that might be slightly different in CAL.
Unlike languages such as C, CAL makes a strong distinction between statements and expressions. They have very distinct roles, very distinct meanings, and they can never be used interchangeably. An expression in CAL is a piece of code whose sole purpose is to compute a value. We also say that an expression has a value, or that it evaluates to a value. For most expressions, the value that they evaluate to will depend on the values of one or more variables at the time when the expression is evaluated. Since variable values may change over time, the same expression may have different values when evaluated at different points in time.
Probably the most fundamental expressions are constants. Another group of basic expressions are variable references. Syntactically, a variable is any sequence of letters and digits. One important property of expressions is that they are guaranteed not to change variables (we also say they have no side effects)—consequently, within an expression, multiple references to the same variable will always yield the same result.
CAL provides operators of two kinds to build expressions: unary and binary. A unary operator in CAL is always a prefix operator, i.e. it appears before its single operand. A binary operator occurs between its two operands.
In some ways, statements in CAL are just the opposite of expressions: they do not have a ”return value”, but they can change the values of variables. Indeed, changing the values of variables is the whole point of statements. Statements are executed in strict sequential order, and unless otherwise specified, the execution of statements proceeds in the order in which they appear in the program text, which means that any variable changes produced by a statement may affect the execution of subsequent statements.
As in most other programming languages, there are constructs to control the order in which the statements within a program are executed. The part of this loop that directly follows the 'foreach keyword is a generator, much like those in list comprehensions.
This section is empty. You can help by adding to it. (February 2013) |
This section is empty. You can help by adding to it. (April 2018) |
AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is also used in the verification of analog circuits and mixed-signal circuits, as well as in the design of genetic circuits. In 2009, the Verilog standard was merged into the SystemVerilog standard, creating IEEE Standard 1800-2009. Since then, Verilog has been officially part of the SystemVerilog language. The current version is IEEE standard 1800-2023.
Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.
Curry is a declarative programming language, an implementation of the functional logic programming paradigm, and based on the Haskell language. It merges elements of functional and logic programming, including constraint programming integration.
In computer programming, a parameter or a formal argument is a special kind of variable used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are the values of the arguments with which the subroutine is going to be called/invoked. An ordered list of parameters is usually included in the definition of a subroutine, so that, each time the subroutine is called, its arguments for that call are evaluated, and the resulting values can be assigned to the corresponding parameters.
In computer programming, an assignment statement sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement is a fundamental construct.
BLISS is a system programming language developed at Carnegie Mellon University (CMU) by W. A. Wulf, D. B. Russell, and A. N. Habermann around 1970. It was perhaps the best known system language until C debuted a few years later. Since then, C became popular and common, and BLISS faded into obscurity. When C was in its infancy, a few projects within Bell Labs debated the merits of BLISS vs. C.
TI-BASIC is the official name of a BASIC-like language built into Texas Instruments' graphing calculators. TI-BASIC is a language family of three different and incompatible versions, released on different products:
In computer programming, dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles and architecture. Dataflow programming languages share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term datastream instead of dataflow to avoid confusion with dataflow computing or dataflow architecture, based on an indeterministic machine paradigm. Dataflow programming was pioneered by Jack Dennis and his graduate students at MIT in the 1960s.
Esterel is a synchronous programming language for the development of complex reactive systems. The imperative programming style of Esterel allows the simple expression of parallelism and preemption. As a consequence, it is well suited for control-dominated model designs.
Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions, so that the order of instruction execution may be hard to predict.
Lucid is a dataflow programming language designed to experiment with non-von Neumann programming models. It was designed by Bill Wadge and Ed Ashcroft and described in the 1985 book Lucid, the Dataflow Programming Language.
A Kahn process network is a distributed model of computation in which a group of deterministic sequential processes communicate through unbounded first in, first out channels. The model requires that reading from a channel is blocking while writing is non-blocking. Due to these key restrictions, the resulting process network exhibits deterministic behavior that does not depend on the timing of computation nor on communication delays.
Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.
In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.
In digital logic design, an asynchronous circuit is quasi delay-insensitive (QDI) when it operates correctly, independent of gate and wire delay with the weakest exception necessary to be turing-complete.
In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of black box processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.
PROMELA is a verification modeling language introduced by Gerard J. Holzmann. The language allows for the dynamic creation of concurrent processes to model, for example, distributed systems. In PROMELA models, communication via message channels can be defined to be synchronous, or asynchronous. PROMELA models can be analyzed with the SPIN model checker, to verify that the modeled system produces the desired behavior. An implementation verified with Isabelle/HOL is also available, as part of the Computer Aided Verification of Automata (CAVA) project. Files written in Promela traditionally have a .pml
file extension.
In computing, reactive programming is a declarative programming paradigm concerned with data streams and the propagation of change. With this paradigm, it is possible to express static or dynamic data streams with ease, and also communicate that an inferred dependency within the associated execution model exists, which facilitates the automatic propagation of the changed data flow.
SuperPascal is an imperative, concurrent computing programming language developed by Per Brinch Hansen. It was designed as a publication language: a thinking tool to enable the clear and concise expression of concepts in parallel programming. This is in contrast with implementation languages which are often complicated with machine details and historical conventions. It was created to address the need at the time for a parallel publication language. Arguably, few languages today are expressive and concise enough to be used as thinking tools.