General information | |
---|---|
Launched | October 12, 2011 |
Common manufacturer | |
Architecture and classification | |
Technology node | 32 nm |
Instruction set | x86-64-v2 |
Physical specifications | |
Socket | |
Products, models, variants | |
Core names | |
History | |
Predecessor | Family 10h (K10) |
Successor | Piledriver - Family 15h (2nd-gen) |
The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets. [1] [2] Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011, as the successor to the K10 microarchitecture.
Bulldozer is designed from scratch, not a development of earlier processors. [3] The core is specifically aimed at computing products with TDPs of 10 to 125 watts. AMD claims dramatic performance-per-watt efficiency improvements in high-performance computing (HPC) applications with Bulldozer cores.
The Bulldozer cores support most of the instruction sets implemented by Intel processors (Sandy Bridge) available at its introduction (including SSSE3, SSE4.1, SSE4.2, AES, CLMUL, and AVX) as well as new instruction sets proposed by AMD; ABM, XOP, FMA4 and F16C. [4] [5] Only Bulldozer GEN4 (Excavator) supports AVX2 instruction sets.
According to AMD, Bulldozer-based CPUs are based on GlobalFoundries' 32 nm Silicon on insulator (SOI) process technology and reuses the approach of DEC for multitasking computer performance with the arguments that it, according to press notes, "balances dedicated and shared computer resources to provide a highly compact, high units count design that is easily replicated on a chip for performance scaling." [6] In other words, by eliminating some of the "redundant" elements that naturally creep into multicore designs, AMD has hoped to take better advantage of its hardware capabilities, while using less power.
Bulldozer-based implementations built on 32nm SOI with HKMG arrived in October 2011 for both servers and desktops. The server segment included the dual chip (16-core) Opteron processor codenamed Interlagos (for Socket G34) and single chip (4, 6 or 8 cores) Valencia (for Socket C32), while the Zambezi (4, 6 and 8 cores) targeted desktops on Socket AM3+. [7] [8]
Bulldozer is the first major redesign of AMD’s processor architecture since 2003, when the firm launched its K8 processors, and also features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer clusters, each with 4 pipelines (the fetch/decode stage is shared). Bulldozer also introduced shared L2 cache in the new architecture. AMD calls this design a "Module". A 16-core processor design would feature eight of these "modules", [9] but the operating system will recognize each "module" as two logical cores.
The modular architecture consists of multithreaded shared L2 cache and FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core. [10] [11]
In a retrospective review, Jeremy Laird of APC magazine commented on Bulldozer issues, noted that it was slower than outgoing Phenom II K10 design, and that the PC software ecosystem had not yet "embraced" the multi-threaded model. By his observation, issues caused a big loss for AMD, that the company lost over 1 billion USD in 2012, and that some industry observers were predicting the bankruptcy by mid-2015. The company later managed to return to profit. Mentioned reasons for regaining the profitability were the earlier divesting of in-house manufacturing into GlobalFoundries and then outsourcing the manufacturing to TSMC and making a new Ryzen CPU design. [12]
Bulldozer made use of "Clustered Multithreading" (CMT), a technique where some parts of the processor are shared between two threads and some parts are unique for each thread. Prior examples of such an approach to unconventional multithreading can be traced way back to the 2005 Sun Microsystems' UltraSPARC T1 CPU. In terms of hardware complexity and functionality, a Bulldozer CMT module is equal to a dual-core processor in its integer calculation capabilities, and to either a single-core processor or a handicapped dual-core in terms of floating-point computational power, depending on whether the code is saturated in floating point instructions in both threads running on the same CMT module, and whether the FPU is performing 128-bit or 256-bit floating point operations. The reason for this is that for each two integer cores, that is, within the same module, there is a single floating-point unit consisting of a pair of 128-bit FMAC execution units.
CMT is in some way a simpler but similar design philosophy to SMT; both designs try to utilize execution units efficiently; in either method, when two threads compete for some execution pipelines, there is a loss in performance in one or more of the threads. Due to dedicated integer cores, the Bulldozer family modules performed roughly like a dual-core, dual-threaded processor during sections of code that were either wholly integer or a mix of integer and floating-point calculations; yet, due to the SMT use of the shared floating-point pipelines, the module would perform similarly to a single-core, dual-threaded SMT processor (SMT2) for a pair of threads saturated with floating-point instructions. (Both of these last two comparisons make the assumption that the processor possesses an equally wide and capable execution core, integer-wise and floating-point-wise, respectively.)
Both CMT and SMT are at peak effectiveness while running integer and floating point code on a pair of threads. CMT stays at peak effectiveness while working on a pair of threads consisting both of integer code, while under SMT, one or both threads will underperform due to competition for integer execution units. The disadvantage for CMT is a greater number of idle integer execution units in a single threaded case. In the single threaded case, CMT is limited to use at most half of the integer execution units in its module, while SMT imposes no such limit. A large SMT core with integer circuitry as wide and fast as two CMT cores could in theory have momentarily up to twice an integer performance in a single thread case. (More realistically for general code as a whole, Pollack's Rule estimates a speedup factor of , or approximately 40% increase in performance.)
CMT processors and a typical SMT processor are similar in their efficient shared use of the L2 cache between a pair of threads.
The longer pipeline allowed the Bulldozer family of processors to achieve a much higher clock frequency compared to its K10 predecessors. While this increased frequencies and throughput, the longer pipeline also increased latencies and increased branch misprediction penalties.
The issue widths (and peak instruction executions per cycle) of a Jaguar, K10, and Bulldozer core are 2, 3, and 4 respectively. This made Bulldozer a more superscalar design compared to Jaguar/Bobcat. However, due to K10's somewhat wider core (in addition to the lack of refinements and optimizations in a first generation design) the Bulldozer architecture typically performed with somewhat lower IPC compared to its K10 predecessors. It was not until the refinements made in Piledriver and Steamroller, that the IPC of the Bulldozer family distinctly began to exceed that of K10 processors such as Phenom II.
This section is empty. You can help by adding to it. (March 2023) |
The first revenue shipments of Bulldozer-based Opteron processors was announced on September 7, 2011. [32] The FX-4100, FX-6100, FX-8120 and FX-8150 were released in October 2011; with remaining FX series AMD processors released at the end of the first quarter of 2012.
Model | [Modules/FPUs] Cores/threads | Freq. (GHz) | Max. turbo (GHz) | L2 cache | L3 (MB) | TDP (W) | DDR3 Memory | Turbo Core 2.0 | Socket | |
---|---|---|---|---|---|---|---|---|---|---|
Full load | Half load | |||||||||
FX-8100 | [4]8 | 2.8 | 3.1 | 3.7 | 4× 2MB | 8 | 95 | 1866 | Yes | AM3+ |
FX-8120 | 3.1 | 3.4 | 4.0 | 125 | ||||||
FX-8140 | 3.2 | 3.6 | 4.1 | 95 | ||||||
FX-8150 | 3.6 | 3.9 | 4.2 | 125 | ||||||
FX-8170 | 3.9 | 4.2 | 4.5 | |||||||
FX-6100 | [3]6 | 3.3 | 3.6 | 3.9 | 3× 2MB | 95 | ||||
FX-6120 | 3.6 | 3.9 | 4.2 | |||||||
FX-6130 | 3.6 | 3.8 | 3.9 | |||||||
FX-6200 | 3.8 | 4.0 | 4.1 | 125 | ||||||
FX-4100 | [2]4 | 3.6 | 3.7 | 3.8 | 2× 2MB | 95 | ||||
FX-4120 | 3.9 | 4.0 | 4.1 | |||||||
FX-4130 | 3.8 | 3.9 | 4.0 | 4 | 125 | |||||
FX-4150 | 3.8 | 8 | 95/125 | |||||||
FX-4170 | 4.2 | 4.3 | 125 |
There are two series of Bulldozer-based processors for servers: Opteron 4200 series (Socket C32, code named Valencia, with up to four modules) and Opteron 6200 series (Socket G34, code named Interlagos, with up to 8 modules). [35] [36]
In November 2015, AMD was sued under the California Consumers Legal Remedies Act and Unfair Competition Law for allegedly misrepresenting the specifications of Bulldozer chips. The class-action lawsuit, filed on 26 October in the US District Court for the Northern District of California, claims that each Bulldozer module is in fact a single CPU core with a few dual-core traits, rather than a true dual-core design. [37] In August 2019, AMD agreed to settle the suit for $12.1M. [38] [39]
On 24 October 2011, the first generation tests done by Phoronix confirmed that the performance of Bulldozer CPU was somewhat less than expected. [40] In several tests, the CPU performed similarly to the older generation Phenom 1060T.
The performance later substantially increased, as various compiler optimizations and CPU driver fixes were released. [41] [42]
The first Bulldozer CPUs were met with a mixed response. It was discovered that the FX-8150 performed poorly in benchmarks that were not highly threaded, falling behind the second-generation Intel Core i* series processors and being matched or even outperformed by AMD's own Phenom II X6 at lower clock speeds. In highly threaded benchmarks, the FX-8150 performed on par with the Phenom II X6, and the Intel Core i7 2600K, depending on the benchmark. Given the overall more consistent performance of the Intel Core i5 2500K at a lower price, these results left many reviewers underwhelmed. The processor was found to be extremely power-hungry under load, especially when overclocked, compared to Intel's Sandy Bridge. [43] [44]
On 13 October 2011, AMD stated on its blog that "there are some in our community who feel the product performance did not meet their expectations", but showed benchmarks on actual applications where it outperformed the Sandy Bridge i7 2600k and AMD X6 1100T. [45]
In January 2012, Microsoft released two hotfixes for Windows 7 and Server 2008 R2 that marginally improve the performance of Bulldozer CPUs by addressing the thread scheduling concerns raised after the release of Bulldozer. [46] [47] [48]
On 6 March 2012, AMD posted a knowledge base article stating that there was a compatibility problem with FX processors, and certain games on the widely used digital game distribution platform, Steam. AMD stated that they had provided a BIOS update to several motherboard manufacturers (namely: Asus, Gigabyte Technology, MSI, and ASRock) that would fix the problem. [49]
In September 2014, AMD CEO Rory Read conceded the Bulldozer design had not been a "game-changing part", and that AMD had to live with the design for four years. [50]
On 31 August 2011, AMD and a group of well-known overclockers including Brian McLachlan, Sami Mäkinen, Aaron Schradin, and Simon Solotko managed to set a new world record for CPU frequency using the unreleased and overclocked FX-8150 Bulldozer processor. Before that day, the record sat at 8.309 GHz, but the Bulldozer combined with liquid helium cooling reached a new high of 8.429 GHz. The record has since been overtaken at 8.58 GHz by Andre Yang using liquid nitrogen. [51] [52] On August 22, 2014 and using an FX-8370 (Piledriver), The Stilt from Team Finland achieved a maximum CPU frequency of 8.722 GHz. [53]
The CPU clock frequency records set by overclocked Bulldozer CPUs were only broken almost a decade later by overclocks of Intel's 13th generation Core Raptor Lake CPUs in October 2022. [54]
Piledriver is the AMD codename for its improved second-generation microarchitecture based on Bulldozer. AMD Piledriver cores are found in Socket FM2 Trinity and Richland based series of APUs and CPUs and the Socket AM3+ Vishera based FX-series of CPUs. Piledriver was the last generation in the Bulldozer family to be available for socket AM3+ and to be available with an L3 cache. The Piledriver processors available for FM2 (and its mobile variant) sockets did not come with a L3 cache, as the L2 cache is the last-level cache for all FM2/FM2+ processors.
Steamroller is the AMD codename for its third-generation microarchitecture based on an improved version of Piledriver. Steamroller cores are found in the Socket FM2+ Kaveri based series of APUs and CPUs.
Excavator is the codename for the fourth-generation Bulldozer core. [55] Excavator was implemented as 'Carrizo' A-series APUs, "Bristol Ridge" A-series APUs, and Athlon x4 CPUs. [56]
Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture. It was released on April 22, 2003, with the SledgeHammer core (K8) and was intended to compete in the server and workstation markets, particularly in the same segment as the Intel Xeon processor. Processors based on the AMD K10 microarchitecture were announced on September 10, 2007, featuring a new quad-core configuration. The last released Opteron CPUs are the Piledriver-based Opteron 4300 and 6300 series processors, codenamed "Seoul" and "Abu Dhabi" respectively.
The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP. The Athlon 64 was the second processor to implement the AMD64 architecture and the first 64-bit processor targeted at the average consumer. Variants of the Athlon 64 have been produced for Socket 754, Socket 939, Socket 940, and Socket AM2. It was AMD's primary consumer CPU, and primarily competed with Intel's Pentium 4, especially the Prescott and Cedar Mill core revisions.
The Athlon 64 X2 is the first native dual-core desktop central processing unit (CPU) designed by Advanced Micro Devices (AMD). It was designed from scratch as native dual-core by using an already multi-CPU enabled Athlon 64, joining it with another functional core on one die, and connecting both via a shared dual-channel memory controller/north bridge and additional control logic. The initial versions are based on the E stepping model of the Athlon 64 and, depending on the model, have either 512 or 1024 KB of L2 cache per core. The Athlon 64 X2 can decode instructions for Streaming SIMD Extensions 3 (SSE3), except those few specific to Intel's architecture. The first Athlon 64 X2 CPUs were released in May 2005, in the same month as Intel's first dual-core processor, the Pentium D.
The AMD Family 10h, or K10, is a microprocessor microarchitecture by AMD based on the K8 microarchitecture. The first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November 11, 2007 as the immediate successors to the K8 series of processors.
AMD Accelerated Processing Unit (APU), formerly known as Fusion, is a series of 64-bit microprocessors from Advanced Micro Devices (AMD), combining a general-purpose AMD64 central processing unit (CPU) and 3D integrated graphics processing unit (IGPU) on a single die.
Socket G34 is a land grid array CPU socket designed by AMD to support AMD's multi-chip module Opteron 6000-series server processors. G34 was launched on March 29, 2010, alongside the initial grouping of Opteron 6100 processors designed for it. Socket G34 supports four DDR3 SDRAM channels, two for each die in the 1944 pin CPU package. Socket G34 is available in up to four-socket arrangements, which is a change from the Socket F CPUs supporting up to eight-socket arrangements. However, four Socket G34 CPUs have eight dies, which is identical to what eight Socket F CPUs have. AMD declined to extend Socket G34 to eight-way operation citing shrinking demand of the >4-socket market. AMD is targeting Socket G34 at the high-end two-socket market and the four-socket market. The lower-end two-socket market will be serviced by monolithic-die Socket C32 CPUs with half the core count as the equivalent Socket G34 CPUs.
Phenom II is a family of AMD's multi-core 45 nm processors using the AMD K10 microarchitecture, succeeding the original Phenom. Advanced Micro Devices released the Socket AM2+ version of Phenom II in December 2008, while Socket AM3 versions with DDR3 support, along with an initial batch of triple- and quad-core processors were released on February 9, 2009. Dual-processor systems require Socket F+ for the Quad FX platform. The next-generation Phenom II X6 was released on April 27, 2010.
Athlon II is a family of AMD multi-core 45 nm central processing units, which is aimed at the budget to mid-range market and is a complementary product lineup to the Phenom II.
AMD FX are a series of high-end AMD microprocessors for personal computers which debuted in 2011, claimed as AMD's first native 8-core desktop processor. The line was introduced with the Bulldozer microarchitecture at launch, and was then succeeded by its derivative Piledriver in 2012.
AMD Piledriver Family 15h is a microarchitecture developed by AMD as the second-generation successor to Bulldozer. It targets desktop, mobile and server markets. It is used for the AMD Accelerated Processing Unit, AMD FX, and the Opteron line of processors.
The AMD Jaguar Family 16h is a low-power microarchitecture designed by AMD. It is used in APUs succeeding the Bobcat Family microarchitecture in 2013 and being succeeded by AMD's Puma architecture in 2014. It is two-way superscalar and capable of out-of-order execution. It is used in AMD's Semi-Custom Business Unit as a design for custom processors and is used by AMD in four product families: Kabini aimed at notebooks and mini PCs, Temash aimed at tablets, Kyoto aimed at micro-servers, and the G-Series aimed at embedded applications. Both the PlayStation 4 and the Xbox One use SoCs based on the Jaguar microarchitecture, with more powerful GPUs than AMD sells in its own commercially available Jaguar APUs.
AMD Steamroller Family 15h is a microarchitecture developed by AMD for AMD APUs, which succeeded Piledriver in the beginning of 2014 as the third-generation Bulldozer-based microarchitecture. Steamroller APUs continue to use two-core modules as their predecessors, while aiming at achieving greater levels of parallelism.
Zen is the first iteration in the Zen family of computer processor microarchitectures from AMD. It was first used with their Ryzen series of CPUs in February 2017. The first Zen-based preview system was demonstrated at E3 2016, and first substantially detailed at an event hosted a block away from the Intel Developer Forum 2016. The first Zen-based CPUs, codenamed "Summit Ridge", reached the market in early March 2017, Zen-derived Epyc server processors launched in June 2017 and Zen-based APUs arrived in November 2017.
{{cite web}}
: CS1 maint: archived copy as title (link)