This article needs additional citations for verification .(March 2021) |
TeraScale is the codename for a family of graphics processing unit microarchitectures developed by ATI Technologies/AMD and their second microarchitecture implementing the unified shader model following Xenos . TeraScale replaced the old fixed-pipeline microarchitectures and competed directly with Nvidia's first unified shader microarchitecture named Tesla. [1] [2]
TeraScale was used in Radeon HD 2000 manufactured in 80 nm and 65 nm, Radeon HD 3000 manufactured in 65 nm and 55 nm, Radeon HD 4000 manufactured in 55 nm and 40 nm, Radeon HD 5000 and Radeon HD 6000 manufactured in 40 nm. TeraScale was also used in the AMD Accelerated Processing Units code-named "Brazos", "Llano", "Trinity" and "Richland". TeraScale is even found in some of the succeeding graphics cards brands.
TeraScale is a VLIW SIMD architecture, while Tesla is a RISC SIMD architecture, similar to TeraScale's successor Graphics Core Next. TeraScale implements HyperZ. [3]
An LLVM code generator (i.e. a compiler back-end) is available for TeraScale, [4] but it seems to be missing in LLVM's matrix. [5] E.g. Mesa 3D makes use of it.
Release date | May 2007[ citation needed ] |
---|---|
History | |
Predecessor | Not publicly known[ citation needed ] |
Successor | TeraScale 2 |
Support status | |
Unsupported |
At SIGGRAPH 08 in December 2008, AMD employee Mike Houston described some of the TeraScale microarchitecture. [6]
At FOSDEM09 Matthias Hopf from AMDs technology partner SUSE Linux presented a slide regarding the programming of open-source driver for the R600. [7]
Previous GPU architectures implemented fixed-pipelines, i.e. there were distinct shader processors for each type of shader. TeraScale leverages many flexible shader processors which can be scheduled to process a variety of shader types, thereby significantly increasing GPU throughput (dependent on application instruction mix as noted below). The R600 core processes vertex, geometry, and pixel shaders as outlined by the Direct3D 10.0 specification for Shader Model 4.0 in addition to full OpenGL 3.0 support. [8]
The new unified shader functionality is based upon a very long instruction word (VLIW) architecture in which the core executes operations in parallel. [9]
A shader cluster is organized into 5 stream processing units. Each stream processing unit can retire a finished single precision floating point MAD (or ADD or MUL) instruction per clock, dot product (DP, and special cased by combining ALUs), and integer ADD. [10] The 5th unit is more complex and can additionally handle special transcendental functions such as sine and cosine. [10] Each shader cluster can execute 6 instructions per clock cycle (peak), consisting of 5 shading instructions plus 1 branch. [10]
Notably, the VLIW architecture brings with it some classic challenges inherent to VLIW designs, namely that of maintaining optimal instruction flow. [9] Additionally, the chip cannot co-issue instructions when one is dependent on the results of the other. Performance of the GPU is highly dependent on the mixture of instructions being used by the application and how well the real-time compiler in the driver can organize said instructions. [10]
R600 core includes 64 shader clusters, while RV610 and RV630 cores have 8 and 24 shader clusters respectively.
TeraScale includes multiple units capable of carrying out tessellation. Those are similar to the programmable units of the Xenos GPU which is used in the Xbox 360.
Tessellation was officially specified in the major API's starting with DirectX 11 and OpenGL 4. TeraScale 1 based GPU's (HD 2000, 3000 and 4000 series) are only conformant to Direct3D 10 and OpenGL 3.3 and implements therefore a different tessellation principle which uses vendor specific API extensions. [11] The TeraScale 2 based GPU's (starting with the Radeon HD 5000 series) were the first to conform with both Direct3D 11 and OpenGL 4.0 tesselation technique. [12] Although the TeraScale 1 tessellator is simpler in design, it is described by AMD as a subset of the later tesselation standard. [13]
The TeraScale tessellator units allow the developers to take a simple polygon mesh and subdivide it using a curved surface evaluation function. There are different tessellation forms, such as Bézier surfaces with N-patches, B-splines and NURBS, and also some subdivision techniques of the surface, which usually includes displacement map some kind of a texture. [14] Essentially, this allows a simple, low-polygon model to be increased dramatically in polygon density in real-time with very small impact on the performance. Scott Wasson of Tech Report noted during an AMD demo that the resulting model was so dense with millions of polygons that it appeared to be solid. [9]
The TeraScale tessellator is reminiscent of ATI TruForm , the brand name of an early hardware tessellation unit used initially in the Radeon 8500. [15]
ATI TruForm received little attention from software developers. A few games (such as Madden NFL 2004, Serious Sam, Unreal Tournament 2003 and 2004, and unofficially Morrowind), had the support for the ATI's tesselation technology included. Such a slow adaptation has to do with the fact that it was not a feature shared with NVIDIA GPUs, since those had implemented a competing tessellation solution using Quintic-RT patches which had achieved even less support from the major game developers. [16] Since the Xbox 360's GPU is based on the ATI's architecture, Microsoft saw the hardware-accelerated surface tessellation as a major GPU feature. A couple of years later the tesselation feature became mandatory with the release of the DirectX 11 in 2009. [14] [17]
While the tessellation principle introduced with TeraScale was not part of the OpenGL 3.3 or Direct3D 10.0 requirements, and competitors such as the GeForce 8 series lacked similar hardware, Microsoft has added the tessellation feature as part of their DirectX 10.1 future plans. [17] Finally, Microsoft introduced tessellation as a required capability not with DirectX 10.1 but DirectX 11. [18]
GCN geometric processor is AMD's (which acquired the ATI's GPU business) most current solution for carrying out tessellation using the GPU.
Although the R600 is a significant departure from previous designs, it still shares many features with its predecessor, the Radeon R520. [9] The Ultra-Threaded Dispatch Processor is a major architectural component of the R600 core, just as it was with the Radeon X1000 GPUs. This processor manages a large number of in-flight threads of three distinct types (vertex, geometry, and pixel shaders) and switches amongst them as needed. [9] With a large number of threads being managed simultaneously it is possible to reorganize thread order to optimally utilize the shaders. In other words, the dispatch processor evaluates what goes in the other parts of the R600 and attempts to keep processing efficiency as high as possible. There are lower levels of management as well; each SIMD array of 80 stream processors has its own sequencer and arbiter. The arbiter decides which thread to process next, while the sequencer attempts to reorder instructions for best possible performance within each thread. [9]
Texturing and final output aboard the R600 core is similar but also distinct from R580. R600 is equipped with 4 texture units that are decoupled (independent) from the shader core, like in the R520 and R580 GPUs. [9] The render output units (ROPs) of Radeon HD 2000 series now performs the task of Multisample anti-aliasing (MSAA) with programmable sample grids and maximum of 8 sample points, instead of using pixel shaders as in the Radeon X1000 series. Also new is the capability to filter FP16 textures, popular with HDR lighting, at full-speed. ROP can also perform trilinear and anisotropic filtering on all texture formats. On R600, this totals 16 pixels per clock for FP16 textures, while higher precision FP32 textures filter at half-speed (8 pixels per clock). [9]
Anti-aliasing capabilities are more robust on R600 than on the R520 series. In addition to the ability to perform 8× MSAA, up from 6× MSAA on the R300 through R580, R600 has a new custom filter anti-aliasing (CFAA) mode. CFAA refers to an implementation of non-box filters that look at pixels around the particular pixel being processed in order to calculate the final color and anti-alias the image. [10] CFAA is performed by shader, instead of in the ROPs. This brings greatly enhanced programmability because the filters can be customized, but may also bring potential performance issues because of the use of shader resources. As of launch of R600, CFAA utilizes wide and narrow tent filters. With these, samples from outside the pixel being processed are weighted linearly based upon their distance from the centroid of that pixel, with the linear function adjusted based on the wide or narrow filter chosen. [10]
Memory controllers are connected via internal bi-directional ring bus wrapped around the processor. In Radeon HD 2900, it is a 1,024-bit bi-directional ring bus (512-bit read and 512-bit write), with 8 64-bit memory channels for a total bus width of 512-bits on the 2900 XT.; [9] in Radeon HD 3800, it is a 512-bit ring bus; in Radeon HD 2600 and HD 3600, it is a 256-bit ring bus; In Radeon HD 2400 and HD 3400, there is no ring bus.
This section needs expansion. You can help by adding to it. (May 2009) |
The series saw a half-generation update with die shrink (55 nm) variants: RV670, RV635 and RV620. All variants support PCI Express 2.0, DirectX 10.1 with Shader Model 4.1 features, dedicated ATI Unified Video Decoder (UVD) for all models [19] and PowerPlay technology for desktop video cards. [20]
Except the Radeon HD 3800 series, all variants supported 2 integrated DisplayPort outputs, supporting 24- and 30-bit displays for resolutions up to 2,560×1,600. Each output included 1, 2, or 4 lanes per output, with data rate up to 2.7 Gbit/s per lane.
ATI claimed that the support of DirectX 10.1 can bring improved performance and processing efficiency with reduced rounding error (0.5 ULP compared with average error 1.0 ULP as tolerable error), better image details and quality, global illumination (a technique used in animated films, and more improvements to consumer gaming systems therefore giving more realistic gaming experience. [21] )
(see list of chips in those pages)
Release date | September 2009[ citation needed ] |
---|---|
History | |
Predecessor | TeraScale 1 |
Successor | TeraScale 3 |
Support status | |
Unsupported |
TeraScale 2 (VLIW5) was introduced with Radeon HD 5000 series GPUs in "Evergreen" generation.
At HPG10 Mark Fowler presented the "Evergreen" and stated that e.g. 5870 (Cypress), 5770 (Juniper) and 5670 (Redwood) support max resolution of the 6 times 2560×1600 pixels, while the 5470 (Cedar) supports 4 times 2560×1600 pixels, important for AMD Eyefinity multi-monitor support. [22]
With the release of Cypress, the Terascale graphics engine architecture has been upgraded with twice the number of stream cores, texture units and ROP units compared to the RV770. The architecture of stream cores is largely unchanged, but adds support for DirectX 11/DirectCompute 11 capabilities with new instructions. [23] Also similar to RV770, four texture units are tied to 16 stream cores (each have five processing elements, making a total of 80 processing elements). This combination of is referred to as a SIMD core.
Unlike the predecessor Radeon R700, as DirectX 11 mandates full developer control over interpolation, dedicated interpolators were removed, relying instead on the SIMD cores. The stream cores can handle the higher rounding precision fused multiply–add (FMA) instruction in both single and double precision which increases precision over multiply–add (MAD) and is compliant to IEEE 754-2008 standard. [24] The instruction sum of absolute differences (SAD) has been natively added to the processors. This instruction can be used to greatly improve the performance of some processes, such as video encoding and transcoding on the 3D engine. Each SIMD core is equipped with 32 KiB local data share and 8 kiB of L1 cache, [23] while all SIMD cores share 64 KiB global data share.
Each memory controller ties to two quad ROPs, one per 64-bit channel, and dedicated 512 KiB L2 cache. [23]
AMD PowerPlay is supported, see there.
Release date | October 2010[ citation needed ] |
---|---|
History | |
Predecessor | TeraScale 2 |
Successor | Graphics Core Next 1 |
Support status | |
Unsupported |
TeraScale 3 (VLIW4) replaces the previous 5-way VLIW designs with a 4-way VLIW design. The new design also incorporates an additional tessellation unit to improve Direct3D 11 performance.
TeraScale 3 is introduced in the Radeon HD 6900-branded graphics cards and also implemented in the Trinity and Richland APUs.
AMD PowerTune, dynamic frequency scaling for GPUs, was introduced with the Radeon HD 6900 series on December 15, 2010 and has seen continued development, as documented in some reviews by AnandTech. [25] [26] [27] [28]
At HPG11 in August 2011 AMD employees Michael Mantor (Senior Fellow Architect) and Mike Houston (Fellow Architect) presented Graphics Core Next, the microarchitecture succeeding TeraScale. [29]
Microarchitecture | TeraScale 1 | TeraScale 2 | TeraScale 3 | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Chip1 | R600 | RV610 | RV620 | RV630 | RV635 | RV670 | RV710 | RV711 | RV730 | RV740 | RV770 | RV790 | Cedar (RV810) | Redwood (RV830) | Juniper (RV840) | Cypress (RV870) | Caicos (RV910) | Turks (RV930) | Barts (RV940) | Cayman (RV970) |
Code name | Pele | Laka | Koopa | Shaka | Wario | Boom | Luigi | Mario | Walden | Wekiva | Spartan | ? | ? | ? | ? | ? | ? | Victoria | ? | |
Chip variant(s) | — | M72 M74 | M82 | M76 | M86 | M88 | M92 | M93 | M96 | M97 | M98 | — | Park Robson | Capilano Madison Pinewood | Broadway Granville | Hemlock Lexington | Seymour | Onega Thames Whistler | Blackcomb | Antilles |
Fab (nm) | 80 | 65 | 55 | 65 | 55 | 40 | 55 | 40 | ||||||||||||
Die size (mm2) | 420 | 85 / 82 (M74) | 67 | 153 | 135 | 192 | 73 | 146 | 137 | 256 | 282 | 59 | 104 | 166 | 334 | 67 | 118 / 104 (Thames, Whistler) | 255 / 212 (Blackcomb) | 389 | |
Transistors (million) | 720 | 180 | 181 | 390 | 378 | 666 | 242 | 514 | 826 | 956 | 959 | 292 | 627 | 1,040 | 2,154 | 370 | 716 | 1,700 | 2,640 | |
Transistor density (MTr/mm2) | 1.7 | 2.1 / 2.2 (M74) | 2.7 | 2.5 | 2.8 | 3.5 | 3.3 | 3.5 | 6.0 | 3.7 | 3.4 | 4.9 | 6.0 | 6.3 | 6.4 | 5.5 | 6.1 / 6.9 (Thames, Whistler) | 6.7 / 8.0 (Blackcomb) | 6.8 | |
Compute units | 4 | 2 | 3 | 4 | 1 | 4 | 8 | 10 | 2 | 5 | 10 | 20 / 5 (Lexington) | 2 | 6 | 14 | 24 | ||||
Thread processors | 16 | 4 | 8 | 16 | 8 | 32 | 40 | 8 | 20 | 40 | 80 / 20 (Lexington) | 8 | 24 | 56 | 96 | |||||
Stream processors | 320 | 40 | 120 | 320 | 80 | 320 | 640 | 800 | 80 | 400 | 800 | 1600 / 400 (Lexington) | 160 | 480 | 1120 | 1536 | ||||
Texture mapping units | 16 | 4 | 8 | 16 | 8 | 32 | 40 | 8 | 20 | 40 | 80 / 20 (Lexington) | 8 | 24 | 56 | 96 | |||||
Render output units | 16 | 4 | 16 | 4 | 8 | 16 | 4 | 8 | 16 | 32 / 8 (Lexington) | 4 | 8 | 32 | 32 | ||||||
Z/Stencil OPS | 32 | 8 | 32 | 4 | 32 | 64 | 4 | 40 | 16 | 32 | 40 | 128 | ||||||||
L1 cache (KB) | 32 per 4 SPs (Stream processors) | 16 per CU (Compute unit) | 8 per CU | |||||||||||||||||
L2 cache (KB) | 256 | 32 | 64 | 128 | 256 | 64 | 128 | 256 | 128 | 256 | 512 / 256 (Lexington) | 128 | 256 | 512 | ||||||
Display Core Engine | 2.0 | 3.0 | 2.0 | 3.0 | 2.0 | 3.2 | 3.1 | 4.0 | 5.0 | |||||||||||
Unified Video Decoder | Avivo HD | 1.0 | 2.2 | 2.0 | 2.3 | 3.1 | ||||||||||||||
Initial launch | May 2007 | Jan 2007 | Jan 2008 | Jun 2007 | Jan 2008 | Nov 2007 | Sep 2008 | May 2010 | Sep 2008 | Apr 2009 | Jun 2008 | Apr 2009 | Feb 2010 | Jan 2010 | Oct 2009 | Sep 2009 | Feb 2011 | Oct 2010 | Dec 2010 | |
Series | R600 (Radeon HD 2000 / Radeon HD 3000) | R700 (Radeon HD 4000) | Evergreen (Radeon HD 5000) | Northern Islands (Radeon HD 6000) | ||||||||||||||||
References | [30] [31] | [32] [33] [34] [35] | [36] [37] [38] | [39] [40] [41] | [42] [43] [44] | [45] [46] [47] | [48] [49] [50] | [51] [52] | [53] [54] [55] | [56] [57] [58] | [59] [60] [61] | [62] [63] | [64] [65] [66] [67] | [68] [69] [70] [71] [72] | [73] [74] [75] | [76] [77] [78] [79] | [80] [81] [82] | [83] [84] [85] [86] [87] | [88] [89] [90] | [91] [92] [93] |
1 Duo chips such as R680 (2x RV670) and R700 (2x RV770) are not listed. [94] [95] [96] [97]
Radeon is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Technologies, which was acquired by AMD in 2006 for US$5.4 billion.
AMD FirePro was AMD's brand of graphics cards designed for use in workstations and servers running professional Computer-aided design (CAD), Computer-generated imagery (CGI), Digital content creation (DCC), and High-performance computing/GPGPU applications. The GPU chips on FirePro-branded graphics cards are identical to the ones used on Radeon-branded graphics cards. The end products differentiate substantially by the provided graphics device drivers and through the available professional support for the software. The product line is split into two categories: "W" workstation series focusing on workstation and primarily focusing on graphics and display, and "S" server series focused on virtualization and GPGPU/High-performance computing.
The R520 is a graphics processing unit (GPU) developed by ATI Technologies and produced by TSMC. It was the first GPU produced using a 90 nm photolithography process.
The Radeon R100 is the first generation of Radeon graphics chips from ATI Technologies. The line features 3D acceleration based upon Direct3D 7.0 and OpenGL 1.3, and all but the entry-level versions offloading host geometry calculations to a hardware transform and lighting (T&L) engine, a major improvement in features and performance compared to the preceding Rage design. The processors also include 2D GUI acceleration, video acceleration, and multiple display outputs. "R100" refers to the development codename of the initially released GPU of the generation. It is the basis for a variety of other succeeding products.
The Radeon R700 is the engineering codename for a graphics processing unit series developed by Advanced Micro Devices under the ATI brand name. The foundation chip, codenamed RV770, was announced and demonstrated on June 16, 2008 as part of the FireStream 9250 and Cinema 2.0 initiative launch media event, with official release of the Radeon HD 4800 series on June 25, 2008. Other variants include enthusiast-oriented RV790, mainstream product RV730, RV740 and entry-level RV710.
Unified Video Decoder is the name given to AMD's dedicated video decoding ASIC. There are multiple versions implementing a multitude of video codecs, such as H.264 and VC-1.
The Evergreen series is a family of GPUs developed by Advanced Micro Devices for its Radeon line under the ATI brand name. It was employed in Radeon HD 5000 graphics card series and competed directly with Nvidia's GeForce 400 series.
AMD PowerPlay is the brand name for a set of technologies for the reduction of the energy consumption implemented in several of AMD's graphics processing units and APUs supported by their proprietary graphics device driver "Catalyst". AMD PowerPlay is also implemented into ATI/AMD chipsets which integrated graphics and into AMD's Imageon handheld chipset, that was sold to Qualcomm in 2008.
The Northern Islands series is a family of GPUs developed by Advanced Micro Devices (AMD) forming part of its Radeon-brand, based on the 40 nm process. Some models are based on TeraScale 2 (VLIW5), some on the new TeraScale 3 (VLIW4) introduced with them.
The Radeon HD 7000 series, codenamed "Southern Islands", is a family of GPUs developed by AMD, and manufactured on TSMC's 28 nm process.
Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was launched on January 9, 2012.
The Radeon HD 8000 series is a family of computer GPUs developed by AMD. AMD was initially rumored to release the family in the second quarter of 2013, with the cards manufactured on a 28 nm process and making use of the improved Graphics Core Next architecture. However the 8000 series turned out to be an OEM rebadge of the 7000 series.
The graphics processing unit (GPU) codenamed the Radeon R600 is the foundation of the Radeon HD 2000/3000 series and the FireGL 2007 series video cards developed by ATI Technologies.
The graphics processing unit (GPU) codenamed Radeon R600 is the foundation of the Radeon HD 2000 series and the FireGL 2007 series video cards developed by ATI Technologies. The HD 2000 cards competed with nVidia's GeForce 8 series.
Radeon X800 is a series of graphics cards designed by ATI Technologies Inc. introduced in May of 2004.
The Radeon X700 (RV410) series replaced the X600 in September 2004. X700 Pro is clocked at 425 MHz core, and produced on a 0.11 micrometre process. RV410 used a layout consisting of 8 pixel pipelines connected to 4 ROPs while maintaining the 6 vertex shaders of X800. The 110 nm process was a cost-cutting process, designed not for high clock speeds but for reducing die size while maintaining high yields. An X700 XT was planned for production, and reviewed by various hardware web sites, but was never released. It was believed that X700 XT set too high of a clock ceiling for ATI to profitably produce. X700 XT was also not adequately competitive with nVidia's impressive GeForce 6600GT. ATI would go on produce a card in the X800 series to compete instead.
ATI released the Radeon X300 and X600 boards. These were based on the RV370 and RV380 GPU respectively. They were nearly identical to the chips used in Radeon 9550 and 9600, only differing in that they were native PCI Express offerings. These were very popular for Dell and other OEM companies to sell in various configurations; connectors: DVI vs. DMS-59, card height: full-height vs. half-height.
AMD PowerTune is a series of dynamic frequency scaling technologies built into some AMD GPUs and APUs that allow the clock speed of the processor to be dynamically changed by software. This allows the processor to meet the instantaneous performance needs of the operation being performed, while minimizing power draw, heat generation and noise avoidance. AMD PowerTune aims to solve thermal design power and performance constraints.
The Radeon RX 7000 series is a series of graphics processing units developed by AMD, based on their RDNA 3 architecture. It was announced on November 3, 2022 and is the successor to the Radeon RX 6000 series. Currently AMD has announced six graphics cards of the 7000 series: RX 7600, RX 7600 XT, RX 7700 XT, RX 7800 XT, RX 7900 XT and RX 7900 XTX. AMD officially launched the RX 7900 XT and RX 7900 XTX on December 13, 2022. AMD released the RX 7600 on May 25, 2023. AMD released their last two graphics processing units of the RDNA 3 family on September 6, 2023; the 7700 XT and the 7800 XT. As of January 2024, they have also released the RX 7600 XT.
{{cite web}}
: CS1 maint: archived copy as title (link)