Duncan's taxonomy

Last updated December 18, 2023

Duncan's taxonomy is a classification of computer architectures, proposed by Ralph Duncan in 1990.^[1] Duncan suggested modifications to Flynn's taxonomy ^[2] to include pipelined vector processes.^[3]

Taxonomy

The taxonomy was developed during 1988-1990 and was first published in 1990. Its original categories are indicated below.

Synchronous architectures

This category includes all the parallel architectures that coordinate concurrent execution in lockstep fashion and do so via mechanisms such as global clocks, central control units or vector unit controllers. Further subdivision of this category is made primarily on the basis of the synchronization mechanism.^[1]

Pipelined vector processors

Pipelined vector processors are characterized by pipelined functional units that accept a sequential stream of array or vector elements, such that different stages in a filled pipeline are processing different elements of the vector at a given time.^[4] Parallelism is provided both by the pipelining in individual functional units described above, as well as by operating multiple units of this kind in parallel and by chaining the output of one unit into another unit as input.^[4]

Vector architectures that stream vector elements into functional units from special vector registers are termed register-to-register architectures, while those that feed functional units from special memory buffers are designated as memory-to-memory architectures.^[1] Early examples of register-to-register architectures from the 1960s and early 1970s include the Cray-1 ^[5] and Fujitsu VP-200, while the Control Data Corporation STAR-100, CDC 205 and the Texas Instruments Advanced Scientific Computer are early examples of memory-to-memory vector architectures.^[6]

The late 1980s and early 1990s saw the introduction of vector architectures, such as the Cray Y-MP/4 and Nippon Electric Corporation SX-3 that supported 4-10 vector processors with a shared memory (see NEC SX architecture). RISC-V RVV may mark the beginning of the modern revival of Vector processing.^{[ speculation? ]}

SIMD

This scheme uses the SIMD (single instruction stream, multiple data stream) category from Flynn's taxonomy as a root class for processor array and associative memory subclasses. SIMD architectures^[7] are characterized by having a control unit broadcast a common instruction to all processing elements, which execute that instruction in lockstep on diverse operands from local data. Common features include the ability for individual processors to disable an instruction and the ability to propagate instruction results to immediate neighbors over an interconnection network.

Processor array

Associative memory

Systolic array

Systolic arrays, proposed during the 1980s,^[8] are multiprocessors in which data and partial results are rhythmically pumped from processor to processor through a regular, local interconnection network.^[1] Systolic architectures use a global clock and explicit timing delays to synchronize data flow from processor to processor.^[1] Each processor in a systolic system executes an invariant sequence of instructions before data and results are pulsed to neighboring processors.^[8]

MIMD architectures

Based on Flynn's multiple-instruction-multiple-data streams terminology, this category spans a wide spectrum of architectures in which processors execute multiple instruction sequences on (potentially) dissimilar data streams without strict synchronization. Although both instruction and data streams can be different for each processor, they need not be. Thus, MIMD architectures can run identical programs that are in various stages at any given time, run unique instruction and data streams on each processor or execute a combination of each these scenarios. This category is subdivided further primarily on the basis of memory organization.^[1]

Distributed memory

Shared memory

MIMD-paradigm architectures

The MIMD-based paradigms category subsumes systems in which a specific programming or execution paradigm is at least as fundamental to the architectural design as structural considerations are. Thus, the design of dataflow architectures and reduction machines is as much the product of supporting their distinctive execution paradigm as it is a product of connecting processors and memories in MIMD fashion. The category's subdivisions are defined by these paradigms.^[1]

MIMD/SIMD hybrid

Dataflow machine

Reduction machine

Wavefront array

Related Research Articles

A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">Superscalar processor</span> CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SWAR Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">Multiple instruction, multiple data</span> Computing technique employed to achieve parallelism

In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data.

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in the design of modern processors and their functionalities. Since the rise of multiprocessing central processing units (CPUs), a multiprogramming context has evolved as an extension of the classification system. Vector processing, covered by Duncan's taxonomy, is missing from Flynn's work because the Cray-1 was released in 1977: Flynn's second paper was published in 1972.

In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbours, stores the result within itself and passes it downstream. Systolic arrays were first used in Colossus, which was an early computer used to break German Lorenz ciphers during World War II. Due to the classified nature of Colossus, they were independently invented or rediscovered by H. T. Kung and Charles Leiserson who described arrays for many dense linear algebra computations for banded matrices. Early applications include computing greatest common divisors of integers and polynomials. They are sometimes classified as multiple-instruction single-data (MISD) architectures under Flynn's taxonomy, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories: SISD, SIMD, MISD, MIMD, as discussed later in this article.

A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900.

The CDC STAR-100 is a vector supercomputer that was designed, manufactured, and marketed by Control Data Corporation (CDC). It was one of the first machines to use a vector processor to improve performance on appropriate scientific applications. It was also the first supercomputer to use integrated circuits and the first to be equipped with one million words of computer memory.

<span class="mw-page-title-main">Multiple instruction, single data</span>

In computing, multiple instruction, single data (MISD) is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline. Fault tolerance executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type. Applications for this architecture are much less common than MIMD and SIMD, as the latter two are often more appropriate for common data parallel techniques. Specifically, they allow better scaling and use of computational resources. However, one prominent example of MISD in computing are the Space Shuttle flight control computers.

In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism where-by multiple processors cooperate in the execution of a program in order to obtain results faster.

<span class="mw-page-title-main">Single instruction, single data</span>

In computing, single instruction stream, single data stream (SISD) is a computer architecture in which a single uni-core processor executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von Neumann architecture.

The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North America to provide PlayStation 2 game support. Mass production of the Emotion Engine began in 1999 and ended in late 2012 with the discontinuation of the PlayStation 2.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a sequential language, as an extension to an existing language, or as an entirely new language.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

The Fujitsu FR-V is one of the very few processors ever able to process both a very long instruction word (VLIW) and vector processor instructions at the same time, increasing throughput with high parallel computing while increasing performance per watt and hardware efficiency. The family was presented in 1999. Its design was influenced by the VPP500/5000 models of the Fujitsu VP/2000 vector processor supercomputer line.

Scalar processors are a class of computer processors that process only one data item at a time. Typical data items include integers and floating point numbers.

SIMD within a register (SWAR), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a processor register. SIMD stands for single instruction, multiple data. Flynn's 1972 taxonomy categorises SWAR as "pipelined processing".

References

1 2 3 4 5 6 7 Duncan, Ralph, "A Survey of Parallel Computer Architectures", IEEE Computer. February 1990, pp. 5-16.
↑ Flynn, M.J., "Very High Speed Computing Systems", Proc. IEEE. Vol. 54, 1966, pp.1901-1909.
↑ Introduction to Parallel Algorithms
1 2 Hwang, K., ed., Tutorial Supercomputers: Design and Applications. Computer Society Press, Los Alamitos, California, 1984, esp. chapters 1 and 2.
↑ Russell, R.M., "The CRAY-1 Computer System," Comm. ACM, Jan. 1978, pp. 63-72.
↑ Watson, W.J., The ASC: a Highly Modular Flexible Super Computer Architecture, Proc. AFIPS Fall Joint Computer Conference, 1972, pp. 221-228.
↑ Michael Jurczyk and Thomas Schwederski,"SIMD-Processing: Concepts and Systems", pp. 649-679 in Parallel and Distributed Computing Handbook, A. Zomaya, ed., McGraw-Hill, 1996.
1 2 Kung, H.T., "Why Systolic Arrays?", Computer, Vol. 15, No. 1, Jan. 1982, pp. 37-46.

C Xavier and S S Iyengar, Introduction to Parallel Programming

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[duncan-1] 1 2 3 4 5 6 7 Duncan, Ralph, "A Survey of Parallel Computer Architectures", IEEE Computer. February 1990, pp. 5-16.

[flynn-2] Flynn, M.J., "Very High Speed Computing Systems", Proc. IEEE. Vol. 54, 1966, pp.1901-1909.

[vhsc-3] Introduction to Parallel Algorithms

[Hwang84-4] 1 2 Hwang, K., ed., Tutorial Supercomputers: Design and Applications. Computer Society Press, Los Alamitos, California, 1984, esp. chapters 1 and 2.

[Russell78-5] Russell, R.M., "The CRAY-1 Computer System," Comm. ACM, Jan. 1978, pp. 63-72.

[Watson72-6] Watson, W.J., The ASC: a Highly Modular Flexible Super Computer Architecture, Proc. AFIPS Fall Joint Computer Conference, 1972, pp. 221-228.

[simd96-7] Michael Jurczyk and Thomas Schwederski,"SIMD-Processing: Concepts and Systems", pp. 649-679 in Parallel and Distributed Computing Handbook, A. Zomaya, ed., McGraw-Hill, 1996.

[HTKung-8] 1 2 Kung, H.T., "Why Systolic Arrays?", Computer, Vol. 15, No. 1, Jan. 1982, pp. 37-46.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]