Flynn's taxonomy

Last updated

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 [1] and extended in 1972. [2] The classification system has stuck, and it has been used as a tool in the design of modern processors and their functionalities. Since the rise of multiprocessing central processing units (CPUs), a multiprogramming context has evolved as an extension of the classification system. Vector processing, covered by Duncan's taxonomy, [3] is missing from Flynn's work because the Cray-1 was released in 1977: Flynn's second paper was published in 1972.

Contents

Classifications

The four initial classifications defined by Flynn are based upon the number of concurrent instruction (or control) streams and data streams available in the architecture. [4] Flynn defined three additional sub-categories of SIMD in 1972. [2]

Single instruction stream, single data stream (SISD)

A sequential computer which exploits no parallelism in either the instruction or data streams. Single control unit (CU) fetches a single instruction stream (IS) from memory. The CU then generates appropriate control signals to direct a single processing element (PE) to operate on a single data stream (DS) i.e., one operation at a time.

Examples of SISD architectures are the traditional uniprocessor machines like older personal computers (PCs) (by 2010, many PCs had multiple cores) and mainframe computers.

Single instruction stream, multiple data streams (SIMD)

A single instruction is simultaneously applied to multiple different data streams. Instructions can be executed sequentially, such as by pipelining, or in parallel by multiple functional units. Flynn's 1972 paper subdivided SIMD down into three further categories: [2]

Array processor

The modern term for an array processor is "single instruction, multiple threads" (SIMT). This is a distinct classification in Flynn's 1972 taxonomy, as a subcategory of SIMD. It is identifiable by the parallel subelements having their own independent register file and memory (cache and data memory). Flynn's original papers cite two historic examples of SIMT processors: SOLOMON and ILLIAC IV.

Nvidia commonly uses the term in its marketing materials and technical documents, where it argues for the novelty of its architecture. [6] SOLOMON predates Nvidia by more than 60 years.

The Aspex Microelectronics Associative String Processor (ASP) [7] categorised itself in its marketing material as "massive wide SIMD" but had bit-level ALUs and bit-level predication (Flynn's taxonomy: associative processing), and each of the 4096 processors had their own registers and memory (Flynn's taxonomy: array processing). The Linedancer, released in 2010, contained 4096 2-bit predicated SIMD ALUs, each with its own content-addressable memory, and was capable of 800 billion instructions per second. [8] Aspex's ASP associative array SIMT processor predates NVIDIA by 20 years. [9] [10]

Pipelined processor

At the time that Flynn wrote his 1972 paper many systems were using main memory as the resource from which pipelines were reading and writing. When the resource that all "pipelines" read and write from is the register file rather than main memory, modern variants of SIMD result. Examples include Altivec, NEON, and AVX.

An alternative name for this type of register-based SIMD is "packed SIMD" [11] and another is SIMD within a register (SWAR). When predication is applied, it becomes associative processing (below)

Associative processor

The modern term for associative processor is "predicated" (or masked) SIMD. Examples include AVX-512.

Some modern designs (GPUs in particular) take features of more than one of these subcategories: GPUs of today are SIMT but also are Associative i.e. each processing element in the SIMT array is also predicated.

Multiple instruction streams, single data stream (MISD)

Multiple instructions operate on one data stream. This is an uncommon architecture which is generally used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result. Examples include the Space Shuttle flight control computer. [12]

Multiple instruction streams, multiple data streams (MIMD)

Multiple autonomous processors simultaneously execute different instructions on different data. MIMD architectures include multi-core superscalar processors, and distributed systems, using either one shared memory space or a distributed memory space.

Diagram comparing classifications

These four architectures are shown below visually. Each processing unit (PU) is shown for a uni-core or multi-core computer:

Further divisions

As of 2006, all of the top 10 and most of the TOP500 supercomputers are based on a MIMD architecture.

Although these are not part of Flynn's work, some further divide the MIMD category into the two categories below, [13] [14] [15] [16] [17] and even further subdivisions are sometimes considered. [18]

Single program, multiple data streams (SPMD)

Multiple autonomous processors simultaneously executing the same program (but at independent points, rather than in the lockstep that SIMD imposes) on different data. [19] Also termed single process, multiple data [17] - the use of this terminology for SPMD is technically incorrect, as SPMD is a parallel execution model and assumes multiple cooperating processors executing a program. SPMD is the most common style of explicit parallel programming. [20] The SPMD model and the term was proposed by Frederica Darema of the RP3 team. [21]

Multiple programs, multiple data streams (MPMD)

Multiple autonomous processors simultaneously operating at least two independent programs. In HPC contexts, such systems often pick one node to be the "host" ("the explicit host/node programming model") or "manager" (the "Manager/Worker" strategy), which runs one program that farms out data to all the other nodes which all run a second program. Those other nodes then return their results directly to the manager. An example of this would be the Sony PlayStation 3 game console, with its SPU/PPU processor.

MPMD is common in non-HPC contexts. For example, the make build system can build multiple dependencies in parallel, using target-dependent programs in addition to the make executable itself. MPMD also often takes the form of pipelines. A simple Unix shell command like ls | grep "A" | more launches three processes running separate programs in parallel with the output of one used as the input to the next.

These are both distinct from the explicit parallel programming used in HPC in that the individual programs are generic building blocks rather than implementing part of a specific parallel algorithm. In the pipelining approach, the amount of available parallelism does not increase with the size of the data set.

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SIMD within a register (SWAR) Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">Multiple instruction, multiple data</span> Computing technique employed to achieve parallelism

In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processor cores that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data.

In computer architecture, predication is a feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (predicated) non-branch instructions associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.

In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbours, stores the result within itself and passes it downstream. Systolic arrays were first used in Colossus, which was an early computer used to break German Lorenz ciphers during World War II. Due to the classified nature of Colossus, they were independently invented or rediscovered by H. T. Kung and Charles Leiserson who described arrays for many dense linear algebra computations for banded matrices. Early applications include computing greatest common divisors of integers and polynomials. They are sometimes classified as multiple-instruction single-data (MISD) architectures under Flynn's taxonomy, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories: SISD, SIMD, MISD, MIMD, as discussed later in this article.

<span class="mw-page-title-main">Multiple instruction, single data</span> Parallel computing architecture

In computing, multiple instruction, single data (MISD) is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline. Fault tolerance executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type. Applications for this architecture are much less common than MIMD and SIMD, as the latter two are often more appropriate for common data parallel techniques. Specifically, they allow better scaling and use of computational resources. However, one prominent example of MISD in computing are the Space Shuttle flight control computers.

In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism where-by multiple processors cooperate in the execution of a program in order to obtain results faster.

<span class="mw-page-title-main">Single instruction, single data</span>

In computing, single instruction stream, single data stream (SISD) is a computer architecture in which a single uni-core processor executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von Neumann architecture.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a programming language, as an extension to an existing languages.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

SIMD within a register (SWAR), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a processor register. SIMD stands for single instruction, multiple data. Flynn's 1972 taxonomy categorises SWAR as "pipelined processing".

A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one another through a reconfigurable interconnect of channels. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel programming model for developing high-performance embedded system applications.

Duncan's taxonomy is a classification of computer architectures, proposed by Ralph Duncan in 1990. Duncan suggested modifications to Flynn's taxonomy to include pipelined vector processes.

Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading. It is different from SPMD in that all instructions in all "threads" are executed in lock-step. The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs.

In computing, an array of structures (AoS), structure of arrays (SoA) or array of structures of arrays (AoSoA) are contrasting ways to arrange a sequence of records in memory, with regard to interleaving, and are of interest in SIMD and SIMT programming.

References

  1. Flynn, Michael J. (December 1966). "Very high-speed computing systems". Proceedings of the IEEE . 54 (12): 1901–1909. doi:10.1109/PROC.1966.5273.
  2. 1 2 3 Flynn, Michael J. (September 1972). "Some Computer Organizations and Their Effectiveness" (PDF). IEEE Transactions on Computers . C-21 (9): 948–960. doi:10.1109/TC.1972.5009071. S2CID   18573685.
  3. Duncan, Ralph (February 1990). "A Survey of Parallel Computer Architectures" (PDF). Computer . 23 (2): 5–16. doi:10.1109/2.44900. S2CID   15036692. Archived (PDF) from the original on 2018-07-18. Retrieved 2018-07-18.
  4. "Data-Level Parallelism in Vector, SIMD, and GPU Architectures" (PDF). 12 November 2013.
  5. Flynn, Michael J. (September 1972). "Some Computer Organizations and Their Effectiveness" (PDF). IEEE Transactions on Computers . C-21 (9): 948–960. doi:10.1109/TC.1972.5009071.
  6. "NVIDIA's Next Generation CUDA Compute Architecture: Fermi" (PDF). Nvidia.
  7. Lea, R. M. (1988). "ASP: A Cost-Effective Parallel Microcomputer". IEEE Micro . 8 (5): 10–29. doi:10.1109/40.87518. S2CID   25901856.
  8. "Linedancer HD – Overview". Aspex Semiconductor. Archived from the original on 13 October 2006.
  9. Krikelis, A. (1988). Artificial Neural Network on a Massively Parallel Associative Architecture. International Neural Network Conference. Dordrecht: Springer. doi:10.1007/978-94-009-0643-3_39. ISBN   978-94-009-0643-3.
  10. Ódor, Géza; Krikelis, Argy; Vesztergombi, György; Rohrbach, Francois. "Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture" (PDF).
  11. Miyaoka, Y.; Choi, J.; Togawa, N.; Yanagisawa, M.; Ohtsuki, T. (2002). An algorithm of hardware unit generation for processor core synthesis with packed SIMD type instructions. Asia-Pacific Conference on Circuits and Systems. pp. 171–176. doi:10.1109/APCCAS.2002.1114930. hdl: 2065/10689 . ISBN   0-7803-7690-0.
  12. Spector, A.; Gifford, D. (September 1984). "The space shuttle primary computer system". Communications of the ACM. 27 (9): 872–900. doi: 10.1145/358234.358246 . S2CID   39724471.
  13. "Single Program Multiple Data stream (SPMD)". Llnl.gov. Archived from the original on 2004-06-04. Retrieved 2013-12-09.
  14. "Programming requirements for compiling, building, and running jobs". Lightning User Guide. Archived from the original on September 1, 2006.
  15. "CTC Virtual Workshop". Web0.tc.cornell.edu. Retrieved 2013-12-09.
  16. "NIST SP2 Primer: Distributed-memory programming". Math.nist.gov. Archived from the original on 2013-12-13. Retrieved 2013-12-09.
  17. 1 2 "Understanding parallel job management and message passing on IBM SP systems". Archived from the original on February 3, 2007.
  18. "9.2 Strategies". Distributed Memory Programming. Archived from the original on September 10, 2006.
  19. This article is based on material taken from Flynn's+taxonomy at the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.
  20. "Single program multiple data". Nist.gov. 2004-12-17. Retrieved 2013-12-09.
  21. Darema, Frederica; George, David A.; Norton, V. Alan; Pfister, Gregory F. (1988). "A single-program-multiple-data computational model for EPEX/FORTRAN". Parallel Computing. 7 (1): 11–24. doi:10.1016/0167-8191(88)90094-4.