The following is a comparison of CPU microarchitectures .
Microarchitecture | Year | Pipeline stages | Misc |
---|---|---|---|
Elbrus-8S | 2014 | VLIW, Elbrus (proprietary, closed) version 5, 64-bit | |
AMD K5 | 1996 | 5 | Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming [lower-alpha 1] |
AMD K6 | 1997 | 6 | Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming [lower-alpha 2] |
AMD K6-III | 1999 | Branch prediction, speculative execution, out-of-order execution [1] | |
AMD K7 | 1999 | Out-of-order execution, branch prediction, Harvard architecture | |
AMD K8 | 2003 | 64-bit, integrated memory controller, 16 byte instruction prefetching | |
AMD K10 | 2007 | Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching | |
ARM7TDMI (-S) | 2001 | 3 | |
ARM7EJ-S | 2001 | 5 | |
ARM810 | 5 | static branch prediction, double-bandwidth memory | |
ARM9TDMI | 1998 | 5 | |
ARM1020E | 6 | ||
XScale PXA210/PXA250 | 2002 | 7 | |
ARM1136J(F)-S | 8 | ||
ARM1156T2(F)-S | 9 | ||
ARM Cortex-A5 | 8 | Multi-core, single issue, in-order | |
ARM Cortex-A7 MPCore | 8 | Partial dual-issue, in-order, 2-way set associative level 1 instruction cache | |
ARM Cortex-A8 | 2005 | 13 | Dual-issue, in-order, speculative execution, superscalar, 2-way pipeline decode |
ARM Cortex-A9 MPCore | 2007 | 8–11 | Out-of-order, speculative issue, superscalar |
ARM Cortex-A15 MPCore | 2010 | 15 | Multi-core (up to 16), out-of-order, speculative issue, 3-way superscalar |
ARM Cortex-A53 | 2012 | Partial dual-issue, in-order | |
ARM Cortex-A55 | 2017 | 8 | in-order, speculative execution |
ARM Cortex-A57 | 2012 | Deeply out-of-order, wide multi-issue, 3-way superscalar | |
ARM Cortex-A72 | 2015 | ||
ARM Cortex-A73 | 2016 | Out-of-order superscalar | |
ARM Cortex-A75 | 2017 | 11–13 | Out-of-order superscalar, speculative execution, register renaming, 3-way |
ARM Cortex-A76 | 2018 | 13 | Out-of-order superscalar, 4-way pipeline decode |
ARM Cortex-A77 | 2019 | 13 | Out-of-order superscalar, speculative execution, register renaming, 6-way pipeline decode, 10-issue, branch prediction, L3 cache |
ARM Cortex-A78 | 2020 | 13 | Out-of-order superscalar, register renaming, 4-way pipeline decode, 6 instruction per cycle, branch prediction, L3 cache |
ARM Cortex-A710 | 2021 | 10 | |
ARM Cortex-X1 | 2020 | 13 | 5-wide decode out-of-order superscalar, L3 cache |
ARM Cortex-X2 | 2021 | 10 | |
ARM Cortex-X3 | 2022 | 9 | |
ARM Cortex-X4 | 2023 | 10 | |
AVR32 AP7 | 7 | ||
AVR32 UC3 | 3 | Harvard architecture | |
Bobcat | 2011 | Out-of-order execution | |
Bulldozer | 2011 | 20 | Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 cores per chip, up to 16 MB L3 cache, Virtualization, Turbo Core, FlexFPU which uses simultaneous multithreading [2] |
Piledriver | 2012 | Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 MB L2 cache, up to 16 MB L3 cache, Virtualization, FlexFPU which use simultaneous multithreading, [2] up to 16 cores per chip, up to 5 GHz clock speed, up to 220 W TDP, Turbo Core | |
Steamroller | 2014 | Multi-core, branch prediction | |
Excavator | 2015 | 20 | Multi-core |
Zen | 2017 | 19 | Multi-core, superscalar, 2-way simultaneous multithreading, 4-way decode, out-of-order execution, L3 cache |
Zen+ | 2018 | 19 | Multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache |
Zen 2 | 2019 | 19 | Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache |
Zen 3 | 2020 | 19 | Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, SMT, L3 cache |
Zen 4 | 2022 | Multi-chip module, multi-core, superscalar, L3 cache | |
Crusoe | 2000 | In-order execution, 128-bit VLIW, integrated memory controller | |
Efficeon | 2004 | In-order execution, 256-bit VLIW, fully integrated memory controller | |
Cyrix Cx5x86 | 1995 | 6 [3] | Branch prediction |
Cyrix 6x86 | 1996 | Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution | |
DLX | 5 | ||
eSi-3200 | 5 | In-order, speculative issue | |
eSi-3250 | 5 | In-order, speculative issue | |
EV4 (Alpha 21064) | Superscalar | ||
EV7 (Alpha 21364) | Superscalar design with out-of-order execution, branch prediction, 4-way simultaneous multithreading, integrated memory controller | ||
EV8 (Alpha 21464) | Superscalar design with out-of-order execution | ||
65k | Ultra low power consumption, register renaming, out-of-order execution, branch prediction, multi-core, module, capable of reach higher clock | ||
P5 (Pentium) | 1993 | 5 | Superscalar |
P6 (Pentium Pro) | 14 | Speculative execution, register renaming, superscalar design with out-of-order execution | |
P6 (Pentium II) | 14 [4] | Branch prediction | |
P6 (Pentium III) | 1995 | 14 [4] | |
Intel Itanium "Merced" | 2001 | Single core, L3 cache | |
Intel Itanium 2 "McKinley" | 2002 | 11 [5] | Speculative execution, branch prediction, register renaming, 30 execution units, multithreading, multi-core, coarse-grained multithreading, 2-way simultaneous multithreading, Dual-domain multithreading, Turbo Boost, Virtualization, VLIW, RAS with Advanced Machine Check Architecture, Instruction Replay technology, Cache Safe technology, Enhanced SpeedStep technology |
Intel NetBurst (Willamette) | 2000 | 20 | 2-way simultaneous multithreading (Hyper-threading), Rapid Execution Engine, Execution Trace Cache, quad-pumped Front-Side Bus, Hyper-pipelined Technology, superscalar, out-of order |
NetBurst (Northwood) | 2002 | 20 | 2-way simultaneous multithreading |
NetBurst (Prescott) | 2004 | 31 | 2-way simultaneous multithreading |
NetBurst (Cedar Mill) | 2006 | 31 | 2-way simultaneous multithreading |
Intel Core | 2006 | 12 | Multi-core, out-of-order, 4-way superscalar |
Intel Atom | 16 | 2-way simultaneous multithreading, in-order, no instruction reordering, speculative execution, or register renaming | |
Intel Atom Oak Trail | 2-way simultaneous multithreading, in-order, burst mode, 512 KB L2 cache | ||
Intel Atom Bonnell | 2008 | SMT | |
Intel Atom Silvermont | 2013 | Out-of-order execution | |
Intel Atom Goldmont | 2016 | Multi-core, out-of-order execution, 3-wide superscalar pipeline, L2 cache | |
Intel Atom Goldmont Plus | 2017 | Multi-core | |
Intel Atom Tremont | 2019 | Multi-core, superscalar, out-of-order execution, speculative execution, register renaming | |
Intel Atom Gracemont | 2021 | Multi-core, superscalar, out-of-order execution, speculative execution, register renaming | |
Intel Atom Crestmont | 2023 | Multi-core | |
Intel Atom Skymont | 2024 | Multi-core | |
Nehalem | 2008 | 14 | 2-way simultaneous multithreading, out-of-order, 6-way superscalar, integrated memory controller, L1/L2/L3 cache, Turbo Boost |
Sandy Bridge | 2011 | 14 | 2-way simultaneous multithreading, multi-core, on-die graphics and PCIe controller, system agent with integrated memory and display controller, ring interconnect, L1/L2/L3 cache, micro-op cache, 2 threads per core, Turbo Boost, |
Intel Haswell | 2013 | 14–19 | SoC design, multi-core, multithreading, 2-way simultaneous multithreading, hardware-based transactional memory (in selected models), L4 cache (in GT3 models), Turbo Boost, out-of-order execution, superscalar, up to 8 MB L3 cache (mainstream), up to 20 MB L3 cache (Extreme) |
Broadwell | 2014 | 14–19 | Multi-core, multithreading |
Skylake | 2015 | 14–19 | Multi-core, L4 cache on certain Skylake-R, Skylake-U and Skylake-Y models. On-package PCH on U, Y, m3, m5 and m7 models. 5 wide superscalar/5 issues. |
Kaby Lake | 2016 | 14–19 | Multi-core, L4 cache on certain low and ultra low power models (Kaby Lake-U and Kaby Lake-Y), |
Intel Sunny Cove | 2019 | 14–20 | Multicore, 2-way multithreading, massive OoOE engine, 5 wide superscalar/5 issue. |
Intel Cypress Cove | 2021 | 14 | multicore, 5 wide superscalar/6 issues, massive OoOE engine, big core design. |
Intel Willow Cove | 2020 | Multicore, SMT | |
Intel Golden Cove | 2021 | Multicore, SMT | |
Intel Redwood Cove | 2023 | Multicore, SMT | |
Intel Lion Cove | 2024 | Multicore, without SMT | |
Intel Xeon Phi 7120x | 2013 | 7-stage integer, 6-stage vector | Multi-core, multithreading, 4 hardware-based simultaneous threads per core which can't be disabled unlike regular HyperThreading, Time-multiplexed multithreading, 61 cores per chip, 244 threads per chip, 30.5 MB L2 cache, 300 W TDP, Turbo Boost, in-order dual-issue pipelines, coprocessor, Floating-point accelerator, 512-bit wide Vector-FPU |
LatticeMico32 | 2006 | 6 | Harvard architecture |
Nvidia Denver | 2014 | Multicore, superscalar, 2-way decode, L2 | |
Nvidia Carmel | 2018 | Multicore, 10-way superscalar, L3 | |
POWER1 | 1990 | Superscalar, out-of-order execution | |
POWER3 | 1998 | Superscalar, out-of-order execution | |
POWER4 | 2001 | Superscalar, speculative execution, out-of-order execution | |
POWER5 | 2004 | 2-way simultaneous multithreading, out-of-order execution, integrated memory controller | |
IBM POWER6 | 2007 | 2-way simultaneous multithreading, in-order execution, up to 5 GHz | |
IBM POWER7+ | Multi-core, multithreading, out-of-order, superscalar, 4 intelligent simultaneous threads per core, 12 execution units per core, 8 cores per chip, 80 MB L3 cache, true hardware entropy generator, hardware-assisted cryptographic acceleration, fixed-point unit, decimal fixed-point unit, Turbo Core, decimal floating-point unit | ||
IBM POWER8 | 2013 | 15–23 | Superscalar, L4 cache |
IBM POWER9 | 2017 | 12–16 | Superscalar, out-of-order execution, L4 cache |
IBM Power10 | 2021 | Superscalar | |
IBM Cell | 2006 | Multi-core, multithreading, 2-way simultaneous multithreading (PPE), Power Processor Element, Synergistic Processing Elements, Element Interconnect Bus, in-order execution | |
IBM Cyclops64 | Multi-core, multithreading, 2 threads per core, in-order | ||
IBM zEnterprise zEC12 | 2012 | 15/16/17 | Multi-core, 6 cores per chip, up to 5.5 GHz, superscalar, out-of-order, 48 MB L3 cache, 384 MB shared L4 cache |
IBM A2 | 15 | multicore, 4-way simultaneous multithreaded | |
PowerPC 401 | 1996 | 3 | |
PowerPC 405 | 1998 | 5 | |
PowerPC 440 | 1999 | 7 | |
PowerPC 470 | 2009 | 9 | Symmetric multiprocessing (SMP) |
PowerPC e300 | 4 | Superscalar, branch prediction | |
PowerPC e500 | Dual 7 stage | Multi-core | |
PowerPC e600 | 3-issue 7 stage | Superscalar out-of-order execution, branch prediction | |
PowerPC e5500 | 2010 | 4-issue 7 stage | Out-of-order, multi-core |
PowerPC e6500 | 2012 | Multi-core | |
PowerPC 603 | 4 | 5 execution units, branch prediction, no SMP | |
PowerPC 603q | 1996 | 5 | In-order |
PowerPC 604 | 1994 | 6 | Superscalar, out-of-order execution, 6 execution units, SMP support |
PowerPC 620 | 1997 | 5 | Out-of-order execution, SMP support |
PWRficient PA6T | 2007 | Superscalar, out-of-order execution, 6 execution units | |
R4000 | 1991 | 8 | Scalar |
StrongARM SA-110 | 1996 | 5 | Scalar, in-order |
SuperH SH2 | 5 | ||
SuperH SH2A | 2006 | 5 | Superscalar, Harvard architecture |
SPARC | Superscalar | ||
hyperSPARC | 1993 | Superscalar | |
SuperSPARC | 1992 | Superscalar, in-order | |
SPARC64 VI/VII/VII+ | 2007 | Superscalar, out-of-order [6] | |
UltraSPARC | 1995 | 9 | |
UltraSPARC T1 | 2005 | 6 | Open source, multithreading, multi-core, 4 threads per core, scalar, in-order, integrated memory controller, 1 FPU |
UltraSPARC T2 | 2007 | 8 | Open source, multithreading, multi-core, 8 threads per core |
SPARC T3 | 2010 | 8 | Multithreading, multi-core, 8 threads per core, SMP, 16 cores per chip, 2 MB L3 cache, in-order, hardware random number generator |
Oracle SPARC T4 | 2011 | 16 | Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, SMP, 8 cores per chip, out-of-order, 4 MB L3 cache, out-of order, Hardware random number generator |
Oracle Corporation SPARC T5 | 2013 | 16 | Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 16 cores per chip, out-of-order, 16-way associative shared 8 MB L3 cache, hardware-assisted cryptographic acceleration, stream-processing unit, out-of order execution, RAS features, 16 cryptography units per chip, hardware random number generator |
Oracle SPARC M5 | 16 | Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 6 cores per chip, out-of-order, 48 MB L3 cache, out-of order execution, RAS features, stream-processing unit, hardware-assisted cryptographic acceleration, 6 cryptography units per chip, Hardware random number generator | |
Fujitsu SPARC64 X | Multithreading, multi-core, 2-way simultaneous multithreading, 16 cores per chip, out-of order, 24 MB L2 cache, out-of order, RAS features | ||
Imagination Technologies MIPS Warrior | |||
VIA C7 | 2005 | In-order execution | |
VIA Nano (Isaiah) | 2008 | Superscalar out-of-order execution, branch prediction, 7 execution units | |
WinChip | 1997 | 4 | In-order execution |
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets.
The K6 microprocessor was launched by AMD in 1997. The main advantage of this particular microprocessor is that it was designed to fit into existing desktop designs for Pentium-branded CPUs. It was marketed as a product that could perform as well as its Intel Pentium II equivalent but at a significantly lower price. The K6 had a considerable impact on the PC market and presented Intel with serious competition.
The Pentium is a x86 microprocessor introduced by Intel on March 22, 1993. It is the first CPU using the Pentium brand. Considered the fifth generation in the 8086 compatible line of processors, its implementation and microarchitecture was internally called P5.
x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486. Colloquially, their names were "186", "286", "386" and "486".
A graphics card is a computer expansion card that generates a feed of graphics output to a display device such as a monitor. Graphics cards are sometimes called discrete or dedicated graphics cards to emphasize their distinction to an integrated graphics processor on the motherboard or the central processing unit (CPU). A graphics processing unit (GPU) that performs the necessary computations is the main component in a graphics card, but the acronym "GPU" is sometimes also used to erroneously refer to the graphics card as a whole.
x86-64 is a 64-bit version of the x86 instruction set, first announced in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode.
A graphics processing unit (GPU) is a specialized electronic circuit initially designed to accelerate computer graphics and image processing. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900.
The K5 is AMD's first x86 processor to be developed entirely in-house. Introduced in March 1996, its primary competition was Intel's Pentium microprocessor. The K5 was an ambitious design, closer to a Pentium Pro than a Pentium regarding technical solutions and internal architecture. However, the final product was closer to the Pentium regarding performance, although faster clock-for-clock compared to the Pentium.
x86 virtualization is the use of hardware-assisted virtualization capabilities on an x86/x86-64 CPU.
The thermal design power (TDP), sometimes called thermal design point, is the maximum amount of heat generated by a computer chip or component that the cooling system in a computer is designed to dissipate under any workload.
The AMD Am29000, commonly shortened to 29k, is a family of 32-bit RISC microprocessors and microcontrollers developed and fabricated by Advanced Micro Devices (AMD). Based on the seminal Berkeley RISC, the 29k added a number of significant improvements. They were, for a time, the most popular RISC chips on the market, widely used in laser printers from a variety of manufacturers.
In the fields of digital electronics and computer hardware, multi-channel memory architecture is a technology that increases the data transfer rate between the DRAM memory and the memory controller by adding more channels of communication between them. Theoretically, this multiplies the data rate by exactly the number of channels present. Dual-channel memory employs two channels. The technique goes back as far as the 1960s having been used in IBM System/360 Model 91 and in CDC 6600.
x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that work in tandem with corresponding x86 CPUs. These microchips have names ending in "87". This is also known as the NPX. Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent function and its inverse, for example.
The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a die, transistor count does not represent how advanced the corresponding manufacturing technology is. A better indication of this is transistor density which is the ratio of a semiconductor's transistor count to its die area.
Dell Precision is a series of computer workstations for computer-aided design/architecture/computer graphics professionals, or as small-scale business servers. They are available in both desktop (tower) and mobile (laptop) form. Dell touts their Precision Mobile Workstations are "optimized for performance, reliability and user experience."
Bit manipulation instructions sets are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD. The purpose of these instruction sets is to improve the speed of bit manipulation. All the instructions in these sets are non-SIMD and operate only on general-purpose registers.
The Intel Management Engine (ME), also known as the Intel Manageability Engine, is an autonomous subsystem that has been incorporated in virtually all of Intel's processor chipsets since 2008. It is located in the Platform Controller Hub of modern Intel motherboards.
Hardware-based encryption is the use of computer hardware to assist software, or sometimes replace software, in the process of data encryption. Typically, this is implemented as part of the processor's instruction set. For example, the AES encryption algorithm can be implemented using the AES instruction set on the ubiquitous x86 architecture. Such instructions also exist on the ARM architecture. However, more unusual systems exist where the cryptography module is separate from the central processor, instead being implemented as a coprocessor, in particular a secure cryptoprocessor or cryptographic accelerator, of which an example is the IBM 4758, or its successor, the IBM 4764. Hardware implementations can be faster and less prone to exploitation than traditional software implementations, and furthermore can be protected against tampering.
P6 pipeline