AMD FireStream

Last updated

AMD FireStream was AMD's brand name for their Radeon-based product line targeting stream processing and/or GPGPU in supercomputers. Originally developed by ATI Technologies around the Radeon X1900 XTX in 2006, the product line was previously branded as both ATI FireSTREAM and AMD Stream Processor. [1] The AMD FireStream can also be used as a floating-point co-processor for offloading CPU calculations, which is part of the Torrenza initiative. The FireStream line has been discontinued since 2012, when GPGPU workloads were entirely folded into the AMD FirePro line.

Contents

Overview

The FireStream line is a series of add-on expansion cards released from 2006 to 2010, based on standard Radeon GPUs but designed to serve as a general-purpose co-processor, rather than rendering and outputting 3D graphics. Like the FireGL/FirePro line, they were given more memory and memory bandwidth, but the FireStream cards do not necessarily have video output ports. All support 32-bit single-precision floating point, and all but the first release support 64-bit double-precision. The line was partnered with new APIs to provide higher performance than existing OpenGL and Direct3D shader APIs could provide, beginning with Close to Metal, followed by OpenCL and the Stream Computing SDK, and eventually integrated into the APP SDK.

For highly parallel floating point math workloads, the cards can speed up large computations by more than 10 times; Folding@Home, the earliest and one of the most visible users of the GPGPU, obtained 20-40 times the CPU performance. [2] Each pixel and vertex shader, or unified shader in later models, can perform arbitrary floating-point calculations.

History

Following the release of the Radeon R520 and GeForce G70 GPU cores with programmable shaders, the large floating-point throughput drew attention from academic and commercial groups, experimenting with using then for non-graphics work. The interest led ATI (and Nvidia) to create GPGPU products — able to calculate general purpose mathematical formulas in a massively parallel way — to process heavy calculations traditionally done on CPUs and specialized floating-point math co-processors. GPGPUs were projected to have immediate performance gains of a factor of 10 or more, over compared to contemporary multi-socket CPU-only calculation.

With the development of the high-performance X1900 XFX nearly finished, ATI based its first Stream Processor design on it, announcing it as the upcoming ATI FireSTREAM together with the new Close to Metal API at SIGGRAPH 2006. [3] The core itself was mostly unchanged, except for doubling the onboard memory and bandwidth, similar to the FireGL V7350; new driver and software support made up most of the difference. Folding@home began using the X1900 for general computation, using a pre-release of version 6.5 of the ATI Catalyst driver, and reported 20-40x improvement in GPU over CPU. [2] The first product was released in late 2006, rebranded as AMD Stream Processor after the merger with AMD. [4]

The brand became AMD FireStream with the second generation of stream processors in 2007, based on the RV650 chip with new unified shaders and double precision support. [5] Asynchronous DMA also improved performance by allowing a larger memory pool without the CPU's help. One model was released, the 9170, for the initial price of $1999. Plans included the development of a stream processor on an MXM module by 2008, for laptop computing, [6] but was never released.

The third-generation quickly followed in 2008 with dramatic performance improvements from the RV770 core; the 9250 had nearly double the performance of the 9170, and became the first single-chip teraflop processor, despite dropping the price to under $1000. [7] A faster sibling, the 9270, was released shortly after, for $1999.

In 2010 the final generation of FireStreams came out, the 9350 and 9370 cards, based on the Cypress chip featured in the HD 5800. This generation again doubled the performance relative to the previous, to 2 teraflops in the 9350 and 2.6 teraflops in the 9370, [8] and was the first built from the ground up for OpenCL. This generation was also the only one to feature fully passive cooling, and active cooling was unavailable.

The Northern and Southern Islands generations were skipped, and in 2012, AMD announced that the new FirePro W (workstation) and S (server) series based on the new Graphics Core Next architecture would take the place of FireStream cards. [9]

Models

Model
(Codename)
Launch Architecture
(Fab)
Bus interface Stream processors Clock rate Memory Processing power [lower-alpha 1]
(GFLOPS)
TDP (Watts)
Core (MHz)Memory (MHz)Size (MB)TypeBus width (bit)Bandwidth (GB/s) Single Double
Stream Processor
(R580)
2006R500
80 nm
2406001024 GDDR3 25683.2375 [10] N/A165
FireStream 9170
(RV670) [11] [12]
November 8, 2007TeraScale 1
55 nm
PCIe 2.0 x163208008002048 GDDR3 25651.2512102.4105
FireStream 9250
(RV770) [13] [14]
June 16, 2008TeraScale 1
55 nm
PCIe 2.0 x168006259931024 GDDR3 25663.61000200150
FireStream 9270
(RV770) [15] [16]
November 13, 2008TeraScale 1
55 nm
PCIe 2.0 x168007508502048 GDDR5 256108.81200240160
FireStream 9350
(Cypress XT) [17]
June 23, 2010TeraScale 2
40 nm
PCIe 2.1 x16144070010002048 GDDR5 2561282016403.2150
FireStream 9370
(Cypress XT) [18]
June 23, 2010TeraScale 2
40 nm
PCIe 2.1 x16160082511504096 GDDR5 256147.22640528225


  1. Precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.

Software

The AMD FireStream was launched with a wide range of software platform support. One of the supporting firms was PeakStream (acquired by Google in June 2007), who was first to provide an open beta version of software to support CTM and AMD FireStream as well as x86 and Cell (Cell Broadband Engine) processors. The FireStream was claimed to be 20 times faster in typical applications than regular CPUs after running PeakStream's software [ citation needed ]. RapidMind also provided stream processing software that worked with ATI and NVIDIA, as well as Cell processors. [19]

Software Development Kit

After abandoning their short-lived Close to Metal API, AMD focused on OpenCL. AMD first released its Stream Computing SDK (v1.0), in December 2007 under the AMD EULA, to be run on Windows XP. [19] The SDK includes "Brook+", an AMD hardware optimized version of the Brook language developed by Stanford University, itself a variant of the ANSI C (C language), open-sourced and optimized for stream computing. The AMD Core Math Library (ACML) and AMD Performance Library (APL) with optimizations for the AMD FireStream and the COBRA video library (further renamed as "Accelerated Video Transcoding" or AVT) for video transcoding acceleration will also be included. Another important part of the SDK, the Compute Abstraction Layer (CAL), is a software development layer aimed for low-level access, through the CTM hardware interface, to the GPU architecture for performance tuning software written in various high-level programming languages.

In August 2011, AMD released version 2.5 of the ATI APP Software Development Kit, [19] which includes support for OpenCL 1.1, a parallel computing language developed by the Khronos Group. The concept of compute shaders, officially called DirectCompute, in Microsoft's next generation API called DirectX 11 is already included in graphics drivers with DirectX 11 support.

AMD APP SDK

Benchmarks

According to an AMD-demonstrated system [20] with two dual-core AMD Opteron processors and two Radeon R600 GPU cores running on Microsoft Windows XP Professional, 1  teraflop (TFLOP) can be achieved by a universal multiply-add (MADD) calculation. By comparison, an Intel Core 2 Quad Q9650 3.0 GHz processor at the time could achieve 48 GFLOPS. [21]

In a demonstration of Kaspersky SafeStream anti-virus scanning that had been optimized for AMD stream processors, was able to scan 21 times faster with the R670 based acceleration than with search running entirely on an Opteron, in 2007. [22]

Limitations

See also

Related Research Articles

Graphics processing unit Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles.

Radeon is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of Advanced Micro Devices (AMD). The brand was launched in 2000 by ATI Technologies, which was acquired by AMD in 2006 for US$5.4 billion.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

A physics processing unit (PPU) is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. It is an example of hardware acceleration.

AMD FirePro Brand by AMD

AMD FirePro was AMD's brand of graphics cards intended for use in workstations and servers running professional Computer-aided design (CAD), Computer-generated imagery (CGI), Digital content creation (DCC), and High-performance computing/GPGPU applications. The GPU chips on FirePro-branded graphics cards are identical to the ones used on Radeon-branded graphics cards. The end products differentiate substantially by the provided graphics device drivers and through the available professional support for the software. The product line is split into two categories: "W" workstation series focused on workstation and focusing on graphics and display, and "S" server series focused on virtualization and GPGPU/High-performance computing.

The Brook programming language and its implementation BrookGPU were early and influential attempts to enable general-purpose computing on graphics processing units. Brook, developed at Stanford University graphics group, was a compiler and runtime implementation of a stream programming language targeting modern, highly parallel GPUs such as those found on ATI or Nvidia graphics cards.

CUDA Parallel computing platform and programming model

CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach termed GPGPU. The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

In computing, Close To Metal is the name of a beta version of a low-level programming interface developed by ATI, now the AMD Graphics Product Group, aimed at enabling GPGPU computing. CTM was short-lived, and the first production version of AMD's GPGPU technology is now called AMD Stream SDK, or rather the current AMD APP SDK for Windows and Linux 32-bit and 64-bit. APP stands for "Accelerated Parallel Processing". and also targets Heterogeneous System Architecture.

Unified Video Decoder (UVD), previously called Universal Video Decoder, is the name given to AMD's dedicated video decoding ASIC. There are multiple versions implementing a multitude of video codecs, such as H.264 and VC-1.

The Evergreen series is a family of GPUs developed by Advanced Micro Devices for its Radeon line under the ATI brand name. It was employed in Radeon HD 5000 graphics card series and competed directly with Nvidia's GeForce 400 Series.

AMD Radeon Software Device driver and utility software package for AMD GPUs and CPUs

AMD Radeon Software is a device driver and utility software package for Advanced Micro Devices's graphics cards and APUs. It is built using the Qt toolkit and runs on Microsoft Windows and Linux, 32- and 64-bit x86 processors.

OpenCL Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

Radeon HD 6000 series Series of video cards

The Northern Islands series is a family of GPUs developed by Advanced Micro Devices (AMD) forming part of its Radeon-brand, based on the 40 nm process. Some models are based on TeraScale 2 (VLIW5), some on the new TeraScale 3 (VLIW4) introduced with them.

Radeon HD 7000 series Series of video cards

The Radeon HD 7000 series, codenamed "Southern Islands", is a family of GPUs developed by AMD, and manufactured on TSMC's 28 nm process. The primary competitor of Southern Islands, Nvidia's GeForce 600 Series, also shipped during Q1 2012, largely due to the immaturity of the 28 nm process.

Graphics Core Next Series of microarchitectures and instruction set architecture by AMD

Graphics Core Next (GCN) is the codename for both a series of microarchitectures as well as for an instruction set architecture that was developed by AMD for their GPUs as the successor to their TeraScale microarchitecture/instruction set. The first product featuring GCN was launched on January 9, 2012.

Radeon HD 8000 series Family of GPUs by AMD

The Radeon HD 8000 series is a family of computer GPUs developed by AMD. AMD was initially rumored to release the family in the second quarter of 2013, with the cards manufactured on a 28 nm process and making use of the improved Graphics Core Next architecture. However the 8000 series turned out to be an OEM rebadge of the 7000 series.

The graphics processing unit (GPU) codenamed the Radeon R600 is the foundation of the Radeon HD 2000/3000 series and the FireGL 2007 series video cards developed by ATI Technologies.

TeraScale is the codename for a family of graphics processing unit microarchitectures developed by ATI Technologies/AMD and their second microarchitecture implementing the unified shader model following Xenos. TeraScale replaced the old fixed-pipeline microarchitectures and competed directly with Nvidia's first unified shader microarchitecture named Tesla.

Radeon Instinct Brand name by AMD; family of deep learning oriented GPUs

AMD Radeon Instinct is AMD's brand of deep learning oriented GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Radeon Instinct branded products are intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.

References

  1. AMD Press Release
  2. 1 2 Gasior, Geoff (October 16, 2006). "A closer look at Folding@home on the GPU". The Tech Report . Retrieved 2016-05-26.
  3. ATI SIGGRAPH 2006 Presentation (PDF) (Report). ATI Technologies.
  4. Valich, Theo (November 16, 2006). "ATI FireSTREAM AMD Stream board revealed". The Inquirer . Archived from the original on August 21, 2009. Retrieved 2016-05-26.CS1 maint: unfit URL (link)
  5. "AMD Delivers First Stream Processor with Double Precision Floating Point Technology". AMD. November 8, 2007. Archived from the original on 2017-06-19. Retrieved 2016-05-26.
  6. AMD WW HPC 2007 presentation (PDF) (Report). p. 37.
  7. "AMD Stream Processor First to Break 1 Teraflop Barrier". AMD. June 16, 2008. Archived from the original on 2017-06-19. Retrieved 2016-05-26.
  8. "Newest AMD FireStream(TM) GPU Compute Accelerators Deliver Almost 2x Single and Double Precision Peak Performance and Performance Per Watt Over Last Generation". AMD. June 23, 2010. Archived from the original on 2017-06-19. Retrieved 2016-05-26.
  9. Smith, Ryan (14 August 2012). "The AMD Firepro W9000 W8000 Review Part 1". Anandtech.com. Retrieved 28 June 2016.
  10. "Beyond3D - ATI R580: Radeon X1900 XTX & Crossfire". www.beyond3d.com.
  11. "AMD Delivers First Stream Processor with Double Precision Floating Point Technology". AMD. November 8, 2007. Retrieved 2016-05-26.
  12. "AMD FireStream 9170 Specs". TechPowerUp.
  13. AMD FireStream 9250 - Product page Archived May 13, 2010, at the Wayback Machine
  14. "AMD FireStream 9250 Specs". TechPowerUp.
  15. AMD FireStream 9270 - Product page Archived February 16, 2010, at the Wayback Machine
  16. "AMD FireStream 9270 Specs". TechPowerUp.
  17. "AMD FireStream 9350 Specs". TechPowerUp.
  18. "AMD FireStream 9370 Specs". TechPowerUp.
  19. 1 2 3 AMD APP SDK download page Archived 2012-09-03 at the Wayback Machine and Stream Computing SDK EULA Archived March 6, 2009, at the Wayback Machine , retrieved December 29, 2007
  20. HardOCP report Archived 2016-03-04 at the Wayback Machine , retrieved July 17, 2007
  21. Intel microprocessor export compliance metrics
  22. Valich, Theo (September 12, 2007). "GPGPU drastically accelerates anti-virus software". The Inquirer . Archived from the original on September 23, 2009. Retrieved 2016-05-26.CS1 maint: unfit URL (link)
  23. AMD Intermediate Language Reference Guide, August 2008