Arithmetic logic unit

Last updated
A symbolic representation of an ALU and its input and output signals, indicated by arrows pointing into or out of the ALU, respectively. Each arrow represents one or more signals. Control signals enter from the left and status signals exit on the right; data flows from top to bottom. ALU block.gif
A symbolic representation of an ALU and its input and output signals, indicated by arrows pointing into or out of the ALU, respectively. Each arrow represents one or more signals. Control signals enter from the left and status signals exit on the right; data flows from top to bottom.

In computing, an arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. [1] [2] [3] This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. It is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs). [4]

Contents

The inputs to an ALU are the data to be operated on, called operands, and a code indicating the operation to be performed; the ALU's output is the result of the performed operation. In many designs, the ALU also has status inputs or outputs, or both, which convey information about a previous operation or the current operation, respectively, between the ALU and external status registers.

Signals

An ALU has a variety of input and output nets, which are the electrical conductors used to convey digital signals between the ALU and external circuitry. When an ALU is operating, external circuits apply signals to the ALU inputs and, in response, the ALU produces and conveys signals to external circuitry via its outputs.

Data

A basic ALU has three parallel data buses consisting of two input operands (A and B) and a result output (Y). Each data bus is a group of signals that conveys one binary integer number. Typically, the A, B and Y bus widths (the number of signals comprising each bus) are identical and match the native word size of the external circuitry (e.g., the encapsulating CPU or other processor).

Opcode

The opcode input is a parallel bus that conveys to the ALU an operation selection code, which is an enumerated value that specifies the desired arithmetic or logic operation to be performed by the ALU. The opcode size (its bus width) determines the maximum number of distinct operations the ALU can perform; for example, a four-bit opcode can specify up to sixteen different ALU operations. Generally, an ALU opcode is not the same as a machine language opcode, though in some cases it may be directly encoded as a bit field within a machine language opcode.

Status

Outputs

The status outputs are various individual signals that convey supplemental information about the result of the current ALU operation. General-purpose ALUs commonly have status signals such as:

  • Carry-out, which conveys the carry resulting from an addition operation, the borrow resulting from a subtraction operation, or the overflow bit resulting from a binary shift operation.
  • Zero, which indicates all bits of Y are logic zero.
  • Negative, which indicates the result of an arithmetic operation is negative.
  • Overflow , which indicates the result of an arithmetic operation has exceeded the numeric range of Y.
  • Parity , which indicates whether an even or odd number of bits in Y are logic one.

Upon completion of each ALU operation, the status output signals are usually stored in external registers to make them available for future ALU operations (e.g., to implement multiple-precision arithmetic) or for controlling conditional branching. The collection of bit registers that store the status outputs are often treated as a single, multi-bit register, which is referred to as the "status register" or "condition code register".

Inputs

The status inputs allow additional information to be made available to the ALU when performing an operation. Typically, this is a single "carry-in" bit that is the stored carry-out from a previous ALU operation.

Circuit operation

The combinational logic circuitry of the 74181 integrated circuit, an early four-bit ALU, with logic gates 74181aluschematic.png
The combinational logic circuitry of the 74181 integrated circuit, an early four-bit ALU, with logic gates

An ALU is a combinational logic circuit, meaning that its outputs will change asynchronously in response to input changes. In normal operation, stable signals are applied to all of the ALU inputs and, when enough time (known as the "propagation delay") has passed for the signals to propagate through the ALU circuitry, the result of the ALU operation appears at the ALU outputs. The external circuitry connected to the ALU is responsible for ensuring the stability of ALU input signals throughout the operation, and for allowing sufficient time for the signals to propagate through the ALU before sampling the ALU result.

In general, external circuitry controls an ALU by applying signals to its inputs. Typically, the external circuitry employs sequential logic to control the ALU operation, which is paced by a clock signal of a sufficiently low frequency to ensure enough time for the ALU outputs to settle under worst-case conditions.

For example, a CPU begins an ALU addition operation by routing operands from their sources (which are usually registers) to the ALU's operand inputs, while the control unit simultaneously applies a value to the ALU's opcode input, configuring it to perform addition. At the same time, the CPU also routes the ALU result output to a destination register that will receive the sum. The ALU's input signals, which are held stable until the next clock, are allowed to propagate through the ALU and to the destination register while the CPU waits for the next clock. When the next clock arrives, the destination register stores the ALU result and, since the ALU operation has completed, the ALU inputs may be set up for the next ALU operation.

Functions

A number of basic arithmetic and bitwise logic functions are commonly supported by ALUs. Basic, general purpose ALUs typically include these operations in their repertoires: [1] [2] [3] [5]

Arithmetic operations

Bitwise logical operations

Bit shift operations

Bit shift examples for an eight-bit ALU
TypeLeftRight
Arithmetic shift Rotate left logically.svg Rotate right arithmetically.svg
Logical shift Rotate left logically.svg Rotate right logically.svg
Rotate Rotate left.svg Rotate right.svg
Rotate through carry Rotate left through carry.svg Rotate right through carry.svg

ALU shift operations cause operand A (or B) to shift left or right (depending on the opcode) and the shifted operand appears at Y. Simple ALUs typically can shift the operand by only one bit position, whereas more complex ALUs employ barrel shifters that allow them to shift the operand by an arbitrary number of bits in one operation. In all single-bit shift operations, the bit shifted out of the operand appears on carry-out; the value of the bit shifted into the operand depends on the type of shift.

Applications

Multiple-precision arithmetic

In integer arithmetic computations, multiple-precision arithmetic is an algorithm that operates on integers which are larger than the ALU word size. To do this, the algorithm treats each operand as an ordered collection of ALU-size fragments, arranged from most-significant (MS) to least-significant (LS) or vice versa. For example, in the case of an 8-bit ALU, the 24-bit integer 0x123456 would be treated as a collection of three 8-bit fragments: 0x12 (MS), 0x34, and 0x56 (LS). Since the size of a fragment exactly matches the ALU word size, the ALU can directly operate on this "piece" of operand.

The algorithm uses the ALU to directly operate on particular operand fragments and thus generate a corresponding fragment (a "partial") of the multi-precision result. Each partial, when generated, is written to an associated region of storage that has been designated for the multiple-precision result. This process is repeated for all operand fragments so as to generate a complete collection of partials, which is the result of the multiple-precision operation.

In arithmetic operations (e.g., addition, subtraction), the algorithm starts by invoking an ALU operation on the operands' LS fragments, thereby producing both a LS partial and a carry out bit. The algorithm writes the partial to designated storage, whereas the processor's state machine typically stores the carry out bit to an ALU status register. The algorithm then advances to the next fragment of each operand's collection and invokes an ALU operation on these fragments along with the stored carry bit from the previous ALU operation, thus producing another (more significant) partial and a carry out bit. As before, the carry bit is stored to the status register and the partial is written to designated storage. This process repeats until all operand fragments have been processed, resulting in a complete collection of partials in storage, which comprise the multi-precision arithmetic result.

In multiple-precision shift operations, the order of operand fragment processing depends on the shift direction. In left-shift operations, fragments are processed LS first because the LS bit of each partial—which is conveyed via the stored carry bit—must be obtained from the MS bit of the previously left-shifted, less-significant operand. Conversely, operands are processed MS first in right-shift operations because the MS bit of each partial must be obtained from the LS bit of the previously right-shifted, more-significant operand.

In bitwise logical operations (e.g., logical AND, logical OR), the operand fragments may be processed in any arbitrary order because each partial depends only on the corresponding operand fragments (the stored carry bit from the previous ALU operation is ignored).

Complex operations

Although an ALU can be designed to perform complex functions, the resulting higher circuit complexity, cost, power consumption and larger size makes this impractical in many cases. Consequently, ALUs are often limited to simple functions that can be executed at very high speeds (i.e., very short propagation delays), and the external processor circuitry is responsible for performing complex functions by orchestrating a sequence of simpler ALU operations.

For example, computing the square root of a number might be implemented in various ways, depending on ALU complexity:

The implementations above transition from fastest and most expensive to slowest and least costly. The square root is calculated in all cases, but processors with simple ALUs will take longer to perform the calculation because multiple ALU operations must be performed.

Implementation

An ALU is usually implemented either as a stand-alone integrated circuit (IC), such as the 74181, or as part of a more complex IC. In the latter case, an ALU is typically instantiated by synthesizing it from a description written in VHDL, Verilog or some other hardware description language. For example, the following VHDL code describes a very simple 8-bit ALU:

entityaluisport(-- the alu connections to external circuitry:A:insigned(7downto0);-- operand AB:insigned(7downto0);-- operand BOP:inunsigned(2downto0);-- opcodeY:outsigned(7downto0));-- operation resultendalu;architecturebehavioralofaluisbegincaseOPis-- decode the opcode and perform the operation:when"000"=>Y<=A+B;-- addwhen"001"=>Y<=A-B;-- subtractwhen"010"=>Y<=A-1;-- decrementwhen"011"=>Y<=A+1;-- incrementwhen"100"=>Y<=notA;-- 1's complementwhen"101"=>Y<=AandB;-- bitwise ANDwhen"110"=>Y<=AorB;-- bitwise ORwhen"111"=>Y<=AxorB;-- bitwise XORwhenothers=>Y<=(others=>'X');endcase;endbehavioral;

History

Mathematician John von Neumann proposed the ALU concept in 1945 in a report on the foundations for a new computer called the EDVAC. [6]

The cost, size, and power consumption of electronic circuitry was relatively high throughout the infancy of the information age. Consequently, all early computers had a serial ALU that operated on one data bit at a time although they often presented a wider word size to programmers. The first computer to have multiple parallel discrete single-bit ALU circuits was the 1951 Whirlwind I, which employed sixteen such "math units" to enable it to operate on 16-bit words.

In 1967, Fairchild introduced the first ALU-like device implemented as an integrated circuit, the Fairchild 3800, consisting of an eight-bit arithmetic unit with accumulator. It only supported adds and subtracts but no logic functions. [7]

Full integrated-circuit ALUs soon emerged, including four-bit ALUs such as the Am2901 and 74181. These devices were typically "bit slice" capable, meaning they had "carry look ahead" signals that facilitated the use of multiple interconnected ALU chips to create an ALU with a wider word size. These devices quickly became popular and were widely used in bit-slice minicomputers.

Microprocessors began to appear in the early 1970s. Even though transistors had become smaller, there was sometimes insufficient die space for a full-word-width ALU and, as a result, some early microprocessors employed a narrow ALU that required multiple cycles per machine language instruction. Examples of this includes the popular Zilog Z80, which performed eight-bit additions with a four-bit ALU. [8] Over time, transistor geometries shrank further, following Moore's law, and it became feasible to build wider ALUs on microprocessors.

Modern integrated circuit (IC) transistors are orders of magnitude smaller than those of the early microprocessors, making it possible to fit highly complex ALUs on ICs. Today, many modern ALUs have wide word widths, and architectural enhancements such as barrel shifters and binary multipliers that allow them to perform, in a single clock cycle, operations that would have required multiple operations on earlier ALUs.

ALUs can be realized as mechanical, electro-mechanical or electronic circuits [9] [ failed verification ] and, in recent years, research into biological ALUs has been carried out [10] [11] (e.g., actin-based). [12]

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">Data General Nova</span> 16-bit minicomputer series

The Data General Nova is a series of 16-bit minicomputers released by the American company Data General. The Nova family was very popular in the 1970s and ultimately sold tens of thousands of units.

In digital circuits, an adder–subtractor is a circuit that is capable of adding or subtracting numbers. Below is a circuit that adds or subtracts depending on a control signal. It is also possible to construct a circuit that performs both addition and subtraction at the same time.

A shift register is a type of digital circuit using a cascade of flip-flops where the output of one flip-flop is connected to the input of the next. They share a single clock signal, which causes the data stored in the system to shift from one location to the next. By connecting the last flip-flop back to the first, the data can cycle within the shifters for extended periods, and in this configuration they were used as computer memory, displacing delay-line memory systems in the late 1960s and early 1970s.

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

<span class="mw-page-title-main">MCS-51</span> Single chip microcontroller series by Intel

The Intel MCS-51 is a single chip microcontroller (MCU) series developed by Intel in 1980 for use in embedded systems. The architect of the Intel MCS-51 instruction set was John H. Wharton. Intel's original versions were popular in the 1980s and early 1990s, and enhanced binary compatible derivatives remain popular today. It is a complex instruction set computer, but also has some of the features of RISC architectures, such as a large register set and register windows, and has separate memory spaces for program instructions and data.

<span class="mw-page-title-main">Intel 8085</span> 8-bit microprocessor by Intel

The Intel 8085 ("eighty-eighty-five") is an 8-bit microprocessor produced by Intel and introduced in March 1976. It is the last 8-bit microprocessor developed by Intel.

In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor. Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands.

Two's complement is the most common method of representing signed integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the greatest place value as the sign to indicate whether the binary number is positive or negative. When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.

<span class="mw-page-title-main">Barrel shifter</span> Digital circuit found in computers

A barrel shifter is a digital circuit that can shift a data word by a specified number of bits without the use of any sequential logic, only pure combinational logic, i.e. it inherently provides a binary operation. It can however in theory also be used to implement unary operations, such as logical shift left, in cases where limited by a fixed amount. One way to implement a barrel shifter is as a sequence of multiplexers where the output of one multiplexer is connected to the input of the next multiplexer in a way that depends on the shift distance. A barrel shifter is often used to shift and rotate n-bits in modern microprocessors, typically within a single clock cycle.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

An adder, or summer, is a digital circuit that performs addition of numbers. In many computers and other kinds of processors, adders are used in the arithmetic logic units (ALUs). They are also used in other parts of the processor, where they are used to calculate addresses, table indices, increment and decrement operators and similar operations.

In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements.

In computer processors, the overflow flag is usually a single bit in a system status register used to indicate when an arithmetic overflow has occurred in an operation, indicating that the signed two's-complement result would not fit in the number of bits used for the result. Some architectures may be configured to automatically generate an exception on an operation resulting in overflow.

In computer processors the carry flag is a single bit in a system status register/flag register used to indicate when an arithmetic carry or borrow has been generated out of the most significant arithmetic logic unit (ALU) bit position. The carry flag enables numbers larger than a single ALU width to be added/subtracted by carrying (adding) a binary digit from a partial addition/subtraction to the least significant bit position of a more significant word. This is typically programmed by the user of the processor on the assembly or machine code level, but can also happen internally in certain processors, via digital logic or microcode, where some processors have wider registers and arithmetic instructions than ALU. It is also used to extend bit shifts and rotates in a similar manner on many processors. For subtractive operations, two (opposite) conventions are employed as most machines set the carry flag on borrow while some machines instead reset the carry flag on borrow.

<span class="mw-page-title-main">74181</span> First arithmetic logic unit (ALU) on a single chip

The 74181 is a 4-bit slice arithmetic logic unit (ALU), implemented as a 7400 series TTL integrated circuit. Introduced by Texas Instruments in February 1970, it was the first complete ALU on a single chip. It was used as the arithmetic/logic core in the CPUs of many historically significant minicomputers and other devices.

<span class="mw-page-title-main">MikroSim</span> Educational computer program released in 1992

MikroSim is an educational computer program for hardware-non-specific explanation of the general functioning and behaviour of a virtual processor, running on the Microsoft Windows operating system. Devices like miniaturized calculators, microcontroller, microprocessors, and computer can be explained on custom-developed instruction code on a register transfer level controlled by sequences of micro instructions (microcode). Based on this it is possible to develop an instruction set to control a virtual application board at higher level of abstraction.

In the C programming language, operations can be performed on a bit level using bitwise operators.

<span class="mw-page-title-main">Signetics 8X300</span> Signetics microprocessor

The 8X300 is a microprocessor produced and marketed by Signetics starting 1976 as a second source for the SMS 300 by Scientific Micro Systems, Inc. Although SMS developed the SMS 300, Signetics was the sole manufacturer of this product line. In 1978 Signetics purchased the rights to the SMS 300 series and renamed it 8X300.

<span class="mw-page-title-main">Address generation unit</span>

The address generation unit (AGU), sometimes also called address computation unit (ACU), is an execution unit inside central processing units (CPUs) that calculates addresses used by the CPU to access main memory. By having address calculations handled by separate circuitry that operates in parallel with the rest of the CPU, the number of CPU cycles required for executing various machine instructions can be reduced, bringing performance improvements.

References

  1. 1 2 Atul P. Godse; Deepali A. Godse (2009). "3". Digital Logic Design. Technical Publications. pp. 9–3. ISBN   978-81-8431-738-1.[ permanent dead link ]
  2. 1 2 Leadership Education and Training (LET) 2: Programmed Text. Headquarters, Department of the Army. 2001. pp. 371–.
  3. 1 2 Atul P. Godse; Deepali A. Godse (2009). "Appendix". Digital Logic Circuits. Technical Publications. pp. C–1. ISBN   978-81-8431-650-6.[ permanent dead link ]
  4. "1. An Introduction to Computer Architecture - Designing Embedded Hardware, 2nd Edition [Book]". www.oreilly.com. Retrieved 2020-09-03.
  5. Horowitz, Paul; Winfield Hill (1989). "14.1.1". The Art of Electronics (2nd ed.). Cambridge University Press. pp. 990–. ISBN   978-0-521-37095-0.
  6. Philip Levis (November 8, 2004). "Jonathan von Neumann and EDVAC" (PDF). cs.berkeley.edu. pp. 1, 3. Archived from the original (PDF) on September 23, 2015. Retrieved January 20, 2015.
  7. Sherriff, Ken. "Inside the 74181 ALU chip: die photos and reverse engineering". Ken Shirriff's blog. Retrieved 7 May 2024.
  8. Ken Shirriff. "The Z-80 has a 4-bit ALU. Here's how it works." 2013, righto.com
  9. Reif, John H. (2009), "Mechanical Computing: The Computational Complexity of Physical Devices", in Meyers, Robert A. (ed.), Encyclopedia of Complexity and Systems Science, New York, NY: Springer, pp. 5466–5482, doi:10.1007/978-0-387-30440-3_325, ISBN   978-0-387-30440-3 , retrieved 2020-09-03
  10. Lin, Chun-Liang; Kuo, Ting-Yu; Li, Wei-Xian (2018-08-14). "Synthesis of control unit for future biocomputer". Journal of Biological Engineering. 12 (1): 14. doi: 10.1186/s13036-018-0109-4 . ISSN   1754-1611. PMC   6092829 . PMID   30127848.
  11. Gerd Hg Moe-Behrens. "The biological microprocessor, or how to build a computer with biological parts".
  12. Das, Biplab; Paul, Avijit Kumar; De, Debashis (2019-08-16). "An unconventional Arithmetic Logic Unit design and computing in Actin Quantum Cellular Automata". Microsystem Technologies. 28 (3): 809–822. doi:10.1007/s00542-019-04590-1. ISSN   1432-1858. S2CID   202099203.

Further reading