Kahn process networks

Last updated November 09, 2024

A Kahn process network (KPN, or process network) is a distributed model of computation in which a group of deterministic sequential processes communicate through unbounded first in, first out channels. The model requires that reading from a channel is blocking while writing is non-blocking. Due to these key restrictions, the resulting process network exhibits deterministic behavior that does not depend on the timing of computation nor on communication delays.

Kahn process networks were originally developed for modeling parallel programs, but have proven convenient for modeling embedded systems, high-performance computing systems, signal processing systems, stream processing systems, dataflow programming languages, and other computational tasks. KPNs were introduced by Gilles Kahn in 1974.^[1]

Execution model

KPN is a common model for describing signal processing systems where infinite streams of data are incrementally transformed by processes executing in sequence or parallel. Despite parallel processes, multitasking or parallelism are not required for executing this model.

In a KPN, processes communicate via unbounded FIFO channels. Processes read and write atomic data elements, alternatively called tokens, from and to channels. Writing to a channel is non-blocking, i.e. it always succeeds and does not stall the process, while reading from a channel is blocking, i.e. a process that reads from an empty channel will stall and can only continue when the channel contains sufficient data items (tokens). Processes are not allowed to test an input channel for existence of tokens without consuming them. A FIFO cannot be consumed by multiple processes, nor can multiple processes write to a single FIFO. Given a specific input (token) history for a process, the process must be deterministic so that it always produces the same outputs (tokens). Timing or execution order of processes must not affect the result and therefore testing input channels for tokens is forbidden.

Notes on processes

A process need not read any input or have any input channels as it may act as a pure data source
A process need not write any output or have any output channels
Testing input channels for emptiness (or non-blocking reads) could be allowed for optimization purposes, but it should not affect outputs. It can be beneficial and/or possible to do something in advance rather than wait for a channel. For example, assume there were two reads from different channels. If the first read would stall (wait for a token) but the second read could succeed directly, it could be beneficial to read the second one first to save time, because the reading itself often consumes some time (e.g. time for memory allocation or copying).

Process firing semantics as Petri nets

Assuming process P in the KPN above is constructed so that it first reads data from channel A, then channel B, computes something and then writes data to channel C, the execution model of the process can be modeled with the Petri net shown on the right.^[2] The single token in the PE resource place forbids the process from executing simultaneously for different input data. When data arrives at channel A or B, tokens are placed into places FIFO A and FIFO B respectively. The transitions of the Petri net are associated with the respective I/O operations and computation. When the data has been written to channel C, PE resource is filled with its initial marking again allowing new data to be read.

Process as a finite state machine

A process can be modeled as a finite state machine that is in one of two states:

Active; the process computes or writes data
Wait; the process is blocked (waiting) for data

Assuming the finite state machine reads program elements associated with the process, it may read three kinds of tokens, which are "Compute", "Read" and "Write token". Additionally, in the Wait state it can only come back to Active state by reading a special "Get token" which means the communication channel associated with the wait contains readable data.

Properties

Boundedness of channels

A channel is strictly bounded by $b$ if it has at most $b$ unconsumed tokens for any possible execution. A KPN is strictly bounded by $b$ if all channels are strictly bounded by $b$ .

The number of unconsumed tokens depends on the execution order (scheduling) of processes. A spontaneous data source could produce arbitrarily many tokens into a channel if the scheduler would not execute processes consuming those tokens.

A real application can not have unbounded FIFOs and therefore scheduling and maximum capacity of FIFOs must be designed into a practical implementation. The maximum capacity of FIFOs can be handled in several ways:

FIFO bounds can be mathematically derived in design to avoid FIFO overflows. This is however not possible for all KPNs. It is an undecidable problem to test whether a KPN is strictly bounded by $b$ .^{[ citation needed ]} Moreover, in practical situations, the bound may be data dependent.
FIFO bounds can be grown on demand.^[3]
Blocking writes can be used so that a process blocks if a FIFO is full. This approach may unfortunately lead to an artificial deadlock unless the designer properly derives safe bounds for FIFOs (Parks, 1995). Local artificial detection at run-time may be necessary to guarantee the production of the correct output.^[4]

Closed and open systems

A closed KPN has no external input or output channels. Processes that have no input channels act as data sources and processes that have no output channels act as data sinks. In an open KPN each process has at least one input and output channel.

Determinism

Processes of a KPN are deterministic. For the same input history they must always produce exactly the same output. Processes can be modeled as sequential programs that do reads and writes to ports in any order or quantity as long as determinism property is preserved. As a consequence, KPN model is deterministic so that following factors entirely determine outputs of the system:

processes
the network
initial tokens

Hence, timing of the processes does not affect outputs of the system.

Monotonicity

KPN processes are monotonic . Reading more tokens can only lead to writing more tokens. Tokens read in the future can only affect tokens written in the future. In a KPN there is a total order of events^{[ clarification needed ]} inside a signal.^{[ clarification needed ]} However, there is no order relation between events in different signals. Thus, KPNs are only partly ordered, which classifies them as an untimed model.

Applications

Due to its high expressiveness and succinctness, KPNs as underlying the model of computation are applied in several academic modeling tools to represent streaming applications, which have certain properties (e.g., dataflow-oriented, stream-based).

The open source Daedalus framework^[5] maintained by Leiden Embedded Research Center at Leiden University accepts sequential programs written in C and generates a corresponding KPN. This KPN could, for example, be used to map the KPN onto an FPGA-based platform systematically.

The Ambric Am2045 massively parallel processor array is a KPN implemented in actual silicon.^[6] Its 336 32-bit processors are connected by a programmable interconnect of dedicated FIFOs. Thus its channels are strictly bounded with blocking writes.

The AI Engine's in some AMD Xilinx Versals are building blocks of a Kahn Process Network.^[7]

Related Research Articles

In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time and memory storage requirements. The complexity of a problem is the complexity of the best algorithms that allow solving the problem.

A finite-state machine (FSM) or finite-state automaton, finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some inputs; the change from one state to another is called a transition. An FSM is defined by a list of its states, its initial state, and the inputs that trigger each transition. Finite-state machines are of two types—deterministic finite-state machines and non-deterministic finite-state machines. For any non-deterministic finite-state machine, an equivalent deterministic one can be constructed.

A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algorithm.

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science with close connections to mathematical logic. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a finite automaton (FA) or finite-state machine (FSM). The figure on the right illustrates a finite-state machine, which is a well-known type of automaton. This automaton consists of states and transitions. As the automaton sees a symbol of input, it makes a transition to another state, according to its transition function, which takes the previous state and current input symbol as its arguments.

In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbours, stores the result within itself and passes it downstream. Systolic arrays were first used in Colossus, which was an early computer used to break German Lorenz ciphers during World War II. Due to the classified nature of Colossus, they were independently invented or rediscovered by H. T. Kung and Charles Leiserson who described arrays for many dense linear algebra computations for banded matrices. Early applications include computing greatest common divisors of integers and polynomials. They are sometimes classified as multiple-instruction single-data (MISD) architectures under Flynn's taxonomy, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories: SISD, SIMD, MISD, MIMD, as discussed later in this article.

In computer science, the process calculi are a diverse family of related approaches for formally modelling concurrent systems. Process calculi provide a tool for the high-level description of interactions, communications, and synchronizations between a collection of independent agents or processes. They also provide algebraic laws that allow process descriptions to be manipulated and analyzed, and permit formal reasoning about equivalences between processes. Leading examples of process calculi include CSP, CCS, ACP, and LOTOS. More recent additions to the family include the π-calculus, the ambient calculus, PEPA, the fusion calculus and the join-calculus.

In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming.

In computer programming, dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles and architecture. Dataflow programming languages share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term datastream instead of dataflow to avoid confusion with dataflow computing or dataflow architecture, based on an indeterministic machine paradigm. Dataflow programming was pioneered by Jack Dennis and his graduate students at MIT in the 1960s.

Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions, so that the order of instruction execution may be hard to predict.

In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how units of computations, memories, and communications are organized. The computational complexity of an algorithm can be measured given a model of computation. Using a model allows studying the performance of algorithms independently of the variations that are specific to particular implementations and specific technology.

In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that are too large to fit into a computer's main memory at once. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory such as hard drives or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model.

A synchronous programming language is a computer programming language optimized for programming reactive systems.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

<span class="mw-page-title-main">Binary Modular Dataflow Machine</span>

Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

A Turing machine is a hypothetical computing device, first conceived by Alan Turing in 1936. Turing machines manipulate symbols on a potentially infinite strip of tape according to a finite table of rules, and they provide the theoretical underpinnings for the notion of a computer algorithm.

In computational complexity theory and circuit complexity, a Boolean circuit is a mathematical model for combinational digital logic circuits. A formal language can be decided by a family of Boolean circuits, one circuit for each possible input length.

CAL is a high-level programming language for writing (dataflow) actors, which are stateful operators that transform input streams of data objects (tokens) into output streams. CAL has been compiled to a variety of target platforms, including single-core processors, multicore processors, and programmable hardware. It has been used in several application areas, including video and processing, compression and cryptography. The MPEG Reconfigurable Video Coding (RVC) working group has adopted CAL as part of their standardization efforts.

Synchronous Data Flow (SDF) is a restriction on Kahn process networks where the number of tokens read and written by each process is known ahead of time. In some cases, processes can be scheduled such that channels have bounded FIFOs.

An Oblivious RAM (ORAM) simulator is a compiler that transforms an algorithm in such a way that the resulting algorithm preserves the input-output behavior of the original algorithm but the distribution of the memory access patterns of the transformed algorithm is independent of the memory access pattern of the original algorithm.

References

↑ Kahn, G. (1974). Rosenfeld, Jack L. (ed.). The semantics of a simple language for parallel programming (PDF). Proc. IFIP Congress on Information Processing. North-Holland. ISBN 0-7204-2803-3.
↑ Bernardeschi, C.; De Francesco, N.; Vaglini, G. (1995). "A Petri nets semantics for data flow networks". Acta Informatica. 32 (4): 347–374. doi:10.1007/BF01178383.
↑ Parks, Thomas M. (1995). Bounded Scheduling of Process Networks (Ph. D.). University of California, Berkeley.
↑ Geilen, Marc; Basten, Twan (2003). Degano, P. (ed.). Requirements on the Execution of Kahn Process Networks. Proc. 12th European Symposium on Programming Languages and Systems (ESOP). Springer. pp. 319–334. CiteSeerX 10.1.1.12.7148 .
↑ http://daedalus.liacs.nl LIACS Daedalus framework
↑ Mike Butts, Anthony Mark Jones, Paul Wasson, "A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing", Proceedings of FCCM, April 2007, IEEE Computer Society
↑ AMD Xilinx UG1076 (v2022.2) October 19, 2022 AI Engine Tools and Flows, p.11