Tesla (microarchitecture)

Last updated

Nvidia Tesla
GTX 295 PlatinumEdition.jpg
NVIDIA GeForce GTX 295 of the GeForce 200-line of graphics-cards, was the final major iteration featuring the Tesla microarchitecture (GT200-400-B3).
Release dateNovember 2006
Fabrication process90 nm, 80 nm, 65 nm, 55 nm, and 40 nm
History
Predecessor Curie
Successor Fermi
Support status
Unsupported
Photo of Nikola Tesla, eponym of architecture Tesla circa 1890.jpeg
Photo of Nikola Tesla, eponym of architecture

Tesla is the codename for a GPU microarchitecture developed by Nvidia, and released in 2006, as the successor to Curie microarchitecture. It was named after the pioneering electrical engineer Nikola Tesla. As Nvidia's first microarchitecture to implement unified shaders, it was used with GeForce 8 series, GeForce 9 series, GeForce 100 series, GeForce 200 series, and GeForce 300 series of GPUs, collectively manufactured in 90 nm, 80 nm, 65 nm, 55 nm, and 40 nm. It was also in the GeForce 405 and in the Quadro FX, Quadro x000, Quadro NVS series, and Nvidia Tesla computing modules.

Contents

Tesla replaced the old fixed-pipeline microarchitectures, represented at the time of introduction by the GeForce 7 series. It competed directly with AMD's first unified shader microarchitecture named TeraScale, a development of ATI's work on the Xbox 360 which used a similar design. Tesla was followed by Fermi.

Overview

Tesla is Nvidia's first microarchitecture implementing the unified shader model. The driver supports Direct3D 10 Shader Model 4.0 / OpenGL 2.1 (later drivers have OpenGL 3.3 support) architecture. The design is a major shift for NVIDIA in GPU functionality and capability, the most obvious change being the move from the separate functional units (pixel shaders, vertex shaders) within previous GPUs to a homogeneous collection of universal floating point processors (called "stream processors") that can perform a more universal set of tasks.

GPU NVIDIA G80 NVIDIA G80 GPU Core.jpg
GPU NVIDIA G80
Die shot of the GT200 GPU found inside NVIDIA GeForce GTX 280 cards, based on the Tesla microarchitecture NVIDIA@65nm@Tesla@GT200@GeForce GTX 280@18054233 0817A2 S TAIWAN NH1888.M01 G200-300-A2 DSCx11 polysilicon microscope stitched@2.5x.jpg
Die shot of the GT200 GPU found inside NVIDIA GeForce GTX 280 cards, based on the Tesla microarchitecture

GeForce 8's unified shader architecture consists of a number of stream processors (SPs). Unlike the vector processing approach taken with older shader units, each SP is scalar and thus can operate only on one component at a time. This makes them less complex to build while still being quite flexible and universal. Scalar shader units also have the advantage of being more efficient in a number of cases as compared to previous generation vector shader units that rely on ideal instruction mixture and ordering to reach peak throughput. The lower maximum throughput of these scalar processors is compensated for by efficiency and by running them at a high clock speed (made possible by their simplicity). GeForce 8 runs the various parts of its core at differing clock speeds (clock domains), similar to the operation of the previous GeForce 7 series GPUs. For example, the stream processors of GeForce 8800 GTX operate at a 1.35 GHz clock rate while the rest of the chip is operating at 575 MHz. [1]

GeForce 8 performs significantly better texture filtering than its predecessors that used various optimizations and visual tricks to speed up rendering without impairing filtering quality. The GeForce 8 line correctly renders an angle-independent anisotropic filtering algorithm along with full trilinear texture filtering. G80, though not its smaller brethren, is equipped with much more texture filtering arithmetic ability than the GeForce 7 series. This allows high-quality filtering with a much smaller performance hit than previously. [1]

NVIDIA has also introduced new polygon edge anti-aliasing methods, including the ability of the GPU's ROPs to perform both Multisample anti-aliasing (MSAA) and HDR lighting at the same time, correcting various limitations of previous generations. GeForce 8 can perform MSAA with both FP16 and FP32 texture formats. GeForce 8 supports 128-bit HDR rendering, an increase from prior cards' 64-bit support. The chip's new anti-aliasing technology, called coverage sampling AA (CSAA), uses Z, color, and coverage information to determine final pixel color. This technique of color optimization allows 16X CSAA to look crisp and sharp. [2]

Performance

The claimed theoretical single-precision processing power for Tesla-based cards given in FLOPS may be hard to reach in real-world workloads. [3]

In G80/G90/GT200, each Streaming Multiprocessor (SM) contains 8 Shader Processors (SP, or Unified Shader, or CUDA Core) and 2 Special Function Units (SFU). Each SP can fulfill up to two single-precision operations per clock: 1 Multiply and 1 Add, using a single MAD instruction. Each SFU can fulfill up to four operations per clock: four MUL (Multiply) instructions. So one SM as a whole can execute 8 MADs (16 operations) and 8 MULs (8 operations) per clock, or 24 operations per clock, which is (relatively speaking) 3 times the number of SPs. Therefore, to calculate the theoretical dual-issue MAD+MUL performance in floating point operations per second [FLOPSsp+sfu, GFLOPS] of a graphics card with SP count [n] and shader frequency [f, GHz], the formula is: FLOPSsp+sfu = 3 × n × f. [4] [5]

However leveraging dual-issue performance like MAD+MUL is problematic:

For these reasons, in order to estimate the performance of real-world workloads, it may be more helpful to ignore the SFU and to assume only 1 MAD (2 operations) per SP per cycle. In this case the formula to calculate the theoretical performance in floating point operations per second becomes: FLOPSsp = 2 × n × f.

The theoretical double-precision processing power of a Tesla GPU is 1/8 of the single precision performance on GT200; there is no double precision support on G8x and G9x. [9]

Video decompression/compression

NVDEC

NVENC

NVENC was only introduced in later chips.

Chips

See also

Related Research Articles

<span class="mw-page-title-main">GeForce</span> Brand of GPUs by Nvidia

GeForce is a brand of graphics processing units (GPUs) designed by Nvidia and marketed for the performance market. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, intended for the high-margin PC gaming market, and later diversification of the product line covered all tiers of the PC graphics market, ranging from cost-sensitive GPUs integrated on motherboards, to mainstream add-in retail boards. Most recently, GeForce technology has been introduced into Nvidia's line of embedded application processors, designed for electronic handhelds and mobile handsets.

<span class="mw-page-title-main">Quadro</span> Brand of Nvidia graphics cards used in workstations

Quadro was Nvidia's brand for graphics cards intended for use in workstations running professional computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC) applications, scientific calculations and machine learning from 2000 to 2020.

<span class="mw-page-title-main">GeForce 8 series</span> Series of GPUs by Nvidia

The GeForce 8 series is the eighth generation of Nvidia's GeForce line of graphics processing units. The third major GPU architecture developed by Nvidia, Tesla represents the company's first unified shader architecture.

<span class="mw-page-title-main">GeForce 9 series</span> Series of GPUs by Nvidia

The GeForce 9 series is the ninth generation of Nvidia's GeForce line of graphics processing units, the first of which was released on February 21, 2008. Products are based on a slightly repolished Tesla microarchitecture, adding PCIe 2.0 support, improved color and z-compression, and built on a 65 nm process, later using 55 nm process to reduce power consumption and die size.

PureVideo is Nvidia's hardware SIP core that performs video decoding. PureVideo is integrated into some of the Nvidia GPUs, and it supports hardware decoding of multiple video codec standards: MPEG-2, VC-1, H.264, HEVC, and AV1. PureVideo occupies a considerable amount of a GPU's die area and should not be confused with Nvidia NVENC. In addition to video decoding on chip, PureVideo offers features such as edge enhancement, noise reduction, deinterlacing, dynamic contrast enhancement and color enhancement.

<span class="mw-page-title-main">GeForce 200 series</span> Series of GPUs by Nvidia

The GeForce 200 series is a series of Tesla-based GeForce graphics processing units developed by Nvidia.

<span class="mw-page-title-main">GeForce 400 series</span> Series of GPUs by Nvidia

The GeForce 400 series is a series of graphics processing units developed by Nvidia, serving as the introduction of the Fermi microarchitecture. Its release was originally slated in November 2009, however, after delays, it was released on March 26, 2010, with availability following in April 2010.

<span class="mw-page-title-main">GeForce 500 series</span> Series of GPUs by Nvidia

The GeForce 500 series is a series of graphics processing units developed by Nvidia, as a refresh of the Fermi based GeForce 400 series. It was first released on November 9, 2010 with the GeForce GTX 580.

<span class="mw-page-title-main">GeForce 600 series</span> Series of GPUs by Nvidia

The GeForce 600 series is a series of graphics processing units developed by Nvidia, first released in 2012. It served as the introduction of the Kepler architecture. It is succeeded by the GeForce 700 series.

<span class="mw-page-title-main">GeForce 700 series</span> Series of GPUs by Nvidia

The GeForce 700 series is a series of graphics processing units developed by Nvidia. While mainly a refresh of the Kepler microarchitecture, some cards use Fermi (GF) and later cards use Maxwell (GM). GeForce 700 series cards were first released in 2013, starting with the release of the GeForce GTX Titan on February 19, 2013, followed by the GeForce GTX 780 on May 23, 2013. The first mobile GeForce 700 series chips were released in April 2013.

<span class="mw-page-title-main">Fermi (microarchitecture)</span> GPU microarchitecture by Nvidia

Fermi is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia, first released to retail in April 2010, as the successor to the Tesla microarchitecture. It was the primary microarchitecture used in the GeForce 400 series and GeForce 500 series. All desktop Fermi GPUs were manufactured in 40nm, mobile Fermi GPUs in 40nm and 28nm. Fermi is the oldest microarchitecture from NVIDIA that received support for Microsoft's rendering API Direct3D 12 feature_level 11.

The GeForce 800M series is a family of graphics processing units by Nvidia for laptop PCs. It consists of rebrands of mobile versions of the GeForce 700 series and some newer chips that are lower end compared to the rebrands.

<span class="mw-page-title-main">GeForce 10 series</span> Series of GPUs by Nvidia

The GeForce 10 series is a series of graphics processing units developed by Nvidia, initially based on the Pascal microarchitecture announced in March 2014. This design series succeeded the GeForce 900 series, and is succeeded by the GeForce 16 series and GeForce 20 series using the Turing microarchitecture.

<span class="mw-page-title-main">Kepler (microarchitecture)</span> GPU microarchitecture by Nvidia

Kepler is the codename for a GPU microarchitecture developed by Nvidia, first introduced at retail in April 2012, as the successor to the Fermi microarchitecture. Kepler was Nvidia's first microarchitecture to focus on energy efficiency. Most GeForce 600 series, most GeForce 700 series, and some GeForce 800M series GPUs were based on Kepler, all manufactured in 28 nm. Kepler found use in the GK20A, the GPU component of the Tegra K1 SoC, and in the Quadro Kxxx series, the Quadro NVS 510, and Nvidia Tesla computing modules.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is a discontinued line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">Maxwell (microarchitecture)</span> GPU microarchitecture by Nvidia

Maxwell is the codename for a GPU microarchitecture developed by Nvidia as the successor to the Kepler microarchitecture. The Maxwell architecture was introduced in later models of the GeForce 700 series and is also used in the GeForce 800M series, GeForce 900 series, and Quadro Mxxx series, as well as some Jetson products.

<span class="mw-page-title-main">Pascal (microarchitecture)</span> GPU microarchitecture by Nvidia

Pascal is the codename for a GPU microarchitecture developed by Nvidia, as the successor to the Maxwell architecture. The architecture was first introduced in April 2016 with the release of the Tesla P100 (GP100) on April 5, 2016, and is primarily used in the GeForce 10 series, starting with the GeForce GTX 1080 and GTX 1070, which were released on May 17, 2016, and June 10, 2016, respectively. Pascal was manufactured using TSMC's 16 nm FinFET process, and later Samsung's 14 nm FinFET process.

TeraScale is the codename for a family of graphics processing unit microarchitectures developed by ATI Technologies/AMD and their second microarchitecture implementing the unified shader model following Xenos. TeraScale replaced the old fixed-pipeline microarchitectures and competed directly with Nvidia's first unified shader microarchitecture named Tesla.

<span class="mw-page-title-main">Turing (microarchitecture)</span> GPU microarchitecture by Nvidia

Turing is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is named after the prominent mathematician and computer scientist Alan Turing. The architecture was first introduced in August 2018 at SIGGRAPH 2018 in the workstation-oriented Quadro RTX cards, and one week later at Gamescom in consumer GeForce 20 series graphics cards. Building on the preliminary work of Volta, its HPC-exclusive predecessor, the Turing architecture introduces the first consumer products capable of real-time ray tracing, a longstanding goal of the computer graphics industry. Key elements include dedicated artificial intelligence processors and dedicated ray tracing processors. Turing leverages DXR, OptiX, and Vulkan for access to ray tracing. In February 2019, Nvidia released the GeForce 16 series GPUs, which utilizes the new Turing design but lacks the RT and Tensor cores.

References

  1. 1 2 Wasson, Scott. NVIDIA's GeForce 8800 graphics processor Archived 15 July 2007 at the Wayback Machine , Tech Report, 8 November 2007.
  2. Sommefeldt, Rys.NVIDIA G80: Image Quality Analysis, Beyond3D, 12 December 2006.
  3. "Beyond3D - NVIDIA GT200 GPU and Architecture Analysis". www.beyond3d.com.
  4. 1 2 Anand Lal Shimpi & Derek Wilson. "Derek Gets Technical: 15th Century Loom Technology Makes a Comeback - NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260".
  5. Anand Lal Shimpi & Derek Wilson. "G80: A Mile High Overview - NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10".
  6. Sommefeldt, Rys. NVIDIA G80: Architecture and GPU Analysis - Page 11, Beyond3D, 8 November 2006
  7. "Technical Brief NVIDIA GeForce GTX 200 GPU Architectural Overview" (PDF). May 2008. p. 15. Retrieved 5 December 2015. The individual streaming processing cores of GeForce GTX 200 GPUs can now perform near full-speed dual-issue of multiply-add operations (MADs) and MULs (3 flops/SP)
  8. Kanter, David (8 September 2008). "NVIDIA's GT200: Inside a Parallel Processor". Real World Tech. p. 9.
  9. Smith, Ryan (17 March 2015). "The NVIDIA GeForce GTX Titan X Review". AnandTech . p. 2.