Address generation unit

Last updated

Intel's Nehalem microarchitecture contains multiple AGUs behind the CPU's reservation station. Intel Nehalem arch.svg
Intel's Nehalem microarchitecture contains multiple AGUs behind the CPU's reservation station.

The address generation unit (AGU), sometimes also called address computation unit (ACU), [1] is an execution unit inside central processing units (CPUs) that calculates addresses used by the CPU to access main memory. By having address calculations handled by separate circuitry that operates in parallel with the rest of the CPU, the number of CPU cycles required for executing various machine instructions can be reduced, bringing performance improvements. [2] [3]

In computer engineering, an execution unit is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit, which is not to be confused with the CPU's main control unit, some registers, and other internal units such as an arithmetic logic unit (ALU), address generation unit (AGU), floating-point unit (FPU), load-store unit (LSU), branch execution unit (BEU) or some smaller and more specific components.

Central processing unit electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions

A central processing unit (CPU), also called a central processor or main processor, is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions. The computer industry has used the term "central processing unit" at least since the early 1960s. Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry.

Memory address data concept used at various levels by software and hardware to access the computers primary storage memory; fixed-length sequence of digits conventionally displayed and manipulated as unsigned integers

In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Such numerical semantic bases itself upon features of CPU, as well upon use of the memory like an array endorsed by various programming languages.


While performing various operations, CPUs need to calculate memory addresses required for fetching data from the memory; for example, in-memory positions of array elements must be calculated before the CPU can fetch the data from actual memory locations. Those address-generation calculations involve different integer arithmetic operations, such as addition, subtraction, modulo operations, or bit shifts. Often, calculating a memory address involves more than one general-purpose machine instruction, which do not necessarily decode and execute quickly. By incorporating an AGU into a CPU design, together with introducing specialized instructions that use the AGU, various address-generation calculations can be offloaded from the rest of the CPU, and can often be executed quickly in a single CPU cycle. [2] [3]

Modulo operation

In computing, the modulo operation finds the remainder after division of one number by another.

Instruction cycle basic operation cycle of a computer

The instruction cycle is the cycle which the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instructions. It is composed of three main stages: the fetch stage, the decode stage, and the execute stage.

Capabilities of an AGU depend on a particular CPU and its architecture. Thus, some AGUs implement and expose more address-calculation operations, while some also include more advanced specialized instructions that can operate on multiple operands at a time. [2] [3] Furthermore, some CPU architectures include multiple AGUs so more than one address-calculation operation can be executed simultaneously, bringing further performance improvements by capitalizing on the superscalar nature of advanced CPU designs. For example, Intel incorporates multiple AGUs into its Sandy Bridge and Haswell microarchitectures, which increase bandwidth of the CPU memory subsystem by allowing multiple memory-access instructions to be executed in parallel. [4] [5] [6]

Computer architecture Set of rules and methods that describe the functionality, organization, and implementation of computer systems

In computer engineering, computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer systems. Some definitions of architecture define it as describing the capabilities and programming model of a computer but not a particular implementation. In other definitions computer architecture involves instruction set architecture design, microarchitecture design, logic design, and implementation.

In mathematics an operand is the object of a mathematical operation, i.e. it is the object or quantity that is operated on.

Intel American semiconductor company

Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, in the Silicon Valley. It is the world's second largest and second highest valued semiconductor chip manufacturer based on revenue after being overtaken by Samsung, and is the inventor of the x86 series of microprocessors, the processors found in most personal computers (PCs). Intel ranked No. 46 in the 2018 Fortune 500 list of the largest United States corporations by total revenue.

See also

Arithmetic logic unit digital circuits

An arithmetic logic unit (ALU) is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. An ALU is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs). A single CPU, FPU or GPU may contain multiple ALUs.

Floating-point arithmetic computer format for representing real numbers

In computing, floating-point arithmetic (FP) is arithmetic using formulaic representation of real numbers as an approximation so as to support a trade-off between range and precision. For this reason, floating-point computation is often found in systems which include very small and very large real numbers, which require fast processing times. A number is, in general, represented approximately to a fixed number of significant digits and scaled using an exponent in some fixed base; the base for the scaling is normally two, ten, or sixteen. A number that can be represented exactly is of the following form:

The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets. Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011, as the successor to the K10 microarchitecture.

Related Research Articles

Intel 8088

The Intel 8088 microprocessor is a variant of the Intel 8086. Introduced on July 1, 1979, the 8088 had an eight-bit external data bus instead of the 16-bit bus of the 8086. The 16-bit registers and the one megabyte address range were unchanged, however. In fact, according to the Intel documentation, the 8086 and 8088 have the same execution unit (EU)—only the bus interface unit (BIU) is different. The original IBM PC was based on the 8088.

Microcode is a computer hardware technique that imposes an interpreter between the CPU hardware and the programmer-visible instruction set architecture of the computer. As such, the microcode is a layer of hardware-level instructions that implement higher-level machine code instructions or internal state machine sequencing in many digital processing elements. Microcode is used in general-purpose central processing units, although in current desktop CPUs it is only a fallback path for cases that the faster hardwired control unit cannot handle.

Floating-point unit part of a computer system specially designed to carry out operations on floating point numbers

A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, square root, and bitshifting. Some systems can also perform various transcendental functions such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines.

Superscalar processor CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor that can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows for more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

The Motorola 68000 series is a family of 32-bit CISC microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and were the primary competitors of Intel's x86 microprocessors. They were most well known as the processors powering the early Apple Macintosh, the Commodore Amiga, the Sinclair QL, the Atari ST, the Sega Genesis, and several others. Although no modern desktop computers are based on processors in the 68000 series, derivative processors are still widely used in embedded systems.

Digital signal processor specialized microprocessor optimized for digital signal processing

A digital signal processor (DSP) is a specialized microprocessor, with its architecture optimized for the operational needs of digital signal processing.

In computer architecture, instructions per cycle (IPC) is one aspect of a processor's performance: the average number of instructions executed for each clock cycle. It is the multiplicative inverse of cycles per instruction.


A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

Tomasulo’s algorithm is a computer architecture hardware algorithm for dynamic scheduling of instructions that allows out-of-order execution and enables more efficient use of multiple execution units. It was developed by Robert Tomasulo at IBM in 1967 and was first implemented in the IBM System/360 Model 91’s floating point unit.

The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of CPUs made by Intel. The first CPU to use this architecture was the Willamette-core Pentium 4, released on November 20, 2000 and the first of the Pentium 4 CPUs; all subsequent Pentium 4 and Pentium D variants have also been based on NetBurst. In mid-2004, Intel released the Foster core, which was also based on NetBurst, thus switching the Xeon CPUs to the new architecture as well. Pentium 4-based Celeron CPUs also use the NetBurst architecture.

Microarchitecture the way a given instruction set architecture (ISA) is implemented on a processor

In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

Nehalem (microarchitecture) microarchitecture

Nehalem is the codename for an Intel processor microarchitecture released in November 2008. Nehalem was used in the first generation of the Intel Core processors. Nehalem is the successor to the older Core microarchitecture.

Sandy Bridge Intel processor microarchitecture

Sandy Bridge is the codename for the microarchitecture used in the "second generation" of the Intel Core processors - the Sandy Bridge microarchitecture is the successor to Nehalem microarchitecture. Intel demonstrated a Sandy Bridge processor in 2009, and released first products based on the architecture in January 2011 under the Core brand.

Fermi is the codename for a GPU microarchitecture developed by Nvidia, first released to retail in April 2010, as the successor to the Tesla microarchitecture. It was the primary microarchitecture used in the GeForce 400 series and GeForce 500 series. It was followed by Kepler, and used alongside Kepler in the GeForce 600 series, GeForce 700 series, and GeForce 800 series, in the latter two only in mobile GPUs. In the workstation market, Fermi found use in the Quadro x000 series, Quadro NVS models, as well as in Nvidia Tesla computing modules. All desktop Fermi GPUs were manufactured in 40 nm, mobile Fermi GPUs in 40 nm and 28 nm. Fermi is the oldest microarchitecture from NVIDIA that received support for the Microsoft's rendering API Direct3D 12 feature_level 11.

Goldmont is a microarchitecture for low-power Atom, Celeron and Pentium branded processors used in systems on a chip (SoCs) made by Intel. The Apollo Lake platform with 14 nm Goldmont core was unveiled at the Intel Developer Forum (IDF) in Shenzhen, China, April 2016. The Goldmont architecture borrows heavily from the Skylake Core processors, so it offers a more than 30 percent performance boost compared to the previous Braswell platform, and it can be used to implement power-efficient low-end devices including Cloudbooks, 2-in-1 netbooks, small PCs, IP cameras, and in-car entertainment systems.


  1. Cornelis Van Berkel; Patrick Meuwissen (January 12, 2006). "Address generation unit for a processor (US 2006010255 A1 patent application)". Retrieved December 8, 2014.
  2. 1 2 3 "Chapter 4: Address Generation Unit (DSP56300 Family Manual)" (PDF). September 16, 1999. Retrieved December 8, 2014.
  3. 1 2 3 Darek Mihocka (December 27, 2000). "Pentium 4: Round 1 – Intel blows the lead". Retrieved December 8, 2014.
  4. David Kanter (September 25, 2010). "Intel's Sandy Bridge Microarchitecture: Memory Subsystem". Retrieved December 8, 2014.
  5. David Kanter (November 13, 2012). "Intel's Haswell CPU Microarchitecture: Haswell Memory Hierarchy". Retrieved December 8, 2014.
  6. Per Hammarlund (August 2013). "Fourth-Generation Intel Core Processor, codenamed Haswell" (PDF). p. 25. Retrieved December 8, 2014.