Bit-level parallelism

Last updated

Bit-level parallelism is a form of parallel computing based on increasing processor word size. Increasing the word size reduces the number of instructions the processor must execute in order to perform an operation on variables whose sizes are greater than the length of the word. (For example, consider a case where an 8-bit processor must add two 16-bit integers. The processor must first add the 8 lower-order bits from each integer, then add the 8 higher-order bits, requiring two instructions to complete a single operation. A 16-bit processor would be able to complete the operation with single instruction.)

Parallel computing programming paradigm in which many calculations or the execution of processes are carried out simultaneously

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but it's gaining broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

In computer architecture, 8-bit integers, memory addresses, or other data units are those that are 8 bits wide. Also, 8-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size. 8-bit is also a generation of microcomputers in which 8-bit microprocessors were the norm.

In computer architecture, 16-bit integers, memory addresses, or other data units are those that are 16 bits wide. Also, 16-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size. 16-bit microcomputers are computers in which 16-bit microprocessors were the norm.

Originally, all electronic computers were serial (single-bit) computers. The first electronic computer that was not a serial computer—the first bit-parallel computer—was the 16-bit Whirlwind from 1951.

A serial computer is a computer typified by bit-serial architecture — i.e., internally operating on one bit or digit for each clock cycle. Machines with serial main storage devices such as acoustic or magnetostrictive delay lines and rotating magnetic devices were usually serial computers.

From the advent of very-large-scale integration (VLSI) computer chip fabrication technology in the 1970s until about 1986, advancements in computer architecture were done by increasing bit-level parallelism, [1] as 4-bit microprocessors were replaced by 8-bit, then 16-bit, then 32-bit microprocessors. This trend generally came to an end with the introduction of 32-bit processors, which have been a standard in general purpose computing for two decades. Only recently, with the advent of x86-64 architectures, have 64-bit processors become commonplace.

In computer architecture, 4-bit integers, memory addresses, or other data units are those that are 4 bits wide. Also, 4-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size. A group of four bits is also called a nibble and has 24 = 16 possible values.

In computer architecture, 32-bit integers, memory addresses, or other data units are those that are 32 bits wide. Also, 32-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size. 32-bit microcomputers are computers in which 32-bit microprocessors are the norm.

On 32-bit processors, external data bus width continues to increase. For example, DDR1 SDRAM transfers 128 bits per clock cycle. DDR2 SDRAM transfers a minimum of 256 bits per burst.

Bus (computing) communication system that transfers data between components inside a computer

In computer architecture, a bus is a communication system that transfers data between components inside a computer, or between computers. This expression covers all related hardware components and software, including communication protocols.


Double Data Rate 2 Synchronous Dynamic Random-Access Memory, officially abbreviated as DDR2 SDRAM, is a double data rate synchronous dynamic random-access memory interface. It superseded the original DDR SDRAM specification, and is superseded by DDR3 SDRAM. DDR2 DIMMs are neither forward compatible with DDR3 nor backward compatible with DDR.

See also

SIMD class of parallel computers in Flynns taxonomy, with multiple processing elements that perform the same operation on multiple data points simultaneously

Single instruction, multiple data (SIMD) is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment. SIMD is particularly applicable to common tasks such as adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions to improve the performance of multimedia use. Not to be confused with SIMT which utilizes threads.

SIMD within a register (SWAR) is a technique for performing parallel operations on data contained in a processor register. SIMD stands for single instruction, multiple data.

Related Research Articles

Central processing unit electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions

A central processing unit (CPU), also called a central processor or main processor, is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions. The computer industry has used the term "central processing unit" at least since the early 1960s. Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry.

DEC Alpha 64-bit RISC microprocessor

Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC), designed to replace their 32-bit VAX complex instruction set computer (CISC) ISA. Alpha was implemented in microprocessors originally developed and fabricated by DEC. These microprocessors were most prominently used in a variety of DEC workstations and servers, which eventually formed the basis for almost all of their mid-to-upper-scale lineup. Several third-party vendors also produced Alpha systems, including PC form factor motherboards.

MIPS is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by MIPS Computer Systems.

x86 family of instruction set architectures

x86 is a family of instruction set architectures based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of Central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating point data. SIMD instructions can greatly increase performance when the exact same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.


IA-64 is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was evolved and then implemented in a new processor microarchitecture by Intel with HP's continued partnership and expertise on the underlying EPIC design concepts. In order to establish what was their first new ISA in 20 years and bring an entirely new product line to market, Intel made a massive investment in product definition, design, software development tools, OS, software industry partnerships, and marketing. To support this effort Intel created the largest design team in their history and a new marketing and industry enabling team completely separate from x86. The first Itanium processor, codenamed Merced, was released in 2001.


The transputer is a series of pioneering microprocessors from the 1980s, featuring integrated memory and serial communication links, intended for parallel computing. They were designed and produced by Inmos, a semiconductor company based in Bristol, United Kingdom.

Synchronous dynamic random-access memory (SDRAM) is any dynamic random-access memory (DRAM) where the operation of its external pin interface is coordinated by an externally supplied clock signal. DRAM integrated circuits (ICs) produced from the early 1970s to mid-1990s used an asynchronous interface, in which input control signals have a direct effect on internal functions only delayed by the trip across its semiconductor pathways. SDRAM has a synchronous interface, whereby changes on control inputs are recognised after a rising edge of its clock input. In SDRAM families standardized by JEDEC, the clock signal controls the stepping of an internal finite state machine that responds to incoming commands. These commands can be pipelined to improve performance, with previously started operations completing while new commands are received. The memory is divided into several equally sized but independent sections called banks, allowing the device to operate on a memory access command in each bank simultaneously and speed up access in an interleaved fashion. This allows SDRAMs to achieve greater concurrency and higher data transfer rates than asynchronous DRAMs could.

The Intel i860 was a RISC microprocessor design introduced by Intel in 1989. It was one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the 1980s. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems, and which many considered to be a better design. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

In computer architecture, 64-bit computing is the use of processors that have datapath widths, integer size, and memory address widths of 64 bits. Also, 64-bit computer architectures for central processing units (CPUs) and arithmetic logic units (ALUs) are those that are based on processor registers, address buses, or data buses of that size. From the software perspective, 64-bit computing means the use of code with 64-bit virtual memory addresses. However, not all 64-bit instruction sets support full 64-bit virtual memory addresses; x86-64 and ARMv8, for example, support only 48 bits of virtual address, with the remaining 16 bits of the virtual address required to be all 0's or all 1's, and several 64-bit instruction sets support fewer than 64 bits of physical memory address.


The Blackfin is a family of 16-/32-bit microprocessors developed, manufactured and marketed by Analog Devices. The processors have built-in, fixed-point digital signal processor (DSP) functionality supplied by 16-bit multiply–accumulates (MACs), accompanied on-chip by a microcontroller. It was designed for a unified low-power processor architecture that can run operating systems while simultaneously handling complex numeric tasks such as real-time H.264 video encoding.

Semiconductor memory is a digital electronic data storage device, often used as computer memory, implemented with semiconductor electronic devices on an integrated circuit (IC). There are many different types of implementations using various technologies.

In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word is an important characteristic of any specific processor design or computer architecture.

In computing, hardware acceleration is the use of computer hardware specially made to perform some functions more efficiently than is possible in software running on a general-purpose CPU. Any transformation of data or routine that can be computed, can be calculated purely in software running on a generic CPU, purely in custom-made hardware, or in some mix of both. An operation can be computed faster in application-specific hardware designed or programmed to compute the operation than specified in software and performed on a general-purpose computer processor. Each approach has advantages and disadvantages. The implementation of computing tasks in hardware to decrease latency and increase throughput is known as hardware acceleration.


The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

The SPARC64 V (Zeus) is a SPARC V9 microprocessor designed by Fujitsu. The SPARC64 V was the basis for a series of successive processors designed for servers, and later, supercomputers.

Arithmetic logic unit digital circuits

An arithmetic logic unit (ALU) is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. An ALU is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs). A single CPU, FPU or GPU may contain multiple ALUs.


  1. David E. Culler, Jaswinder Pal Singh, Anoop Gupta. Parallel Computer Architecture - A Hardware/Software Approach. Morgan Kaufmann Publishers, 1999. ISBN   1-55860-343-3, pg 15