Kepler (microarchitecture)

Kepler
Launched	April 3, 2012
Designed by	Nvidia
Manufactured by	TSMC ;
Fabrication process	TSMC 28 nm
Product Series
Desktop	GeForce 600 series ; GeForce 700 series ;
Professional/workstation	Quadro K ;
Server/datacenter	Tesla K ;
Specifications
L1 cache	16 KB (per SM)
L2 cache	Up to 512 KB
Memory support	GDDR5
PCIe support	PCIe 2.0 ; PCIe 3.0
Supported Graphics APIs
DirectX	DirectX 12 Ultimate (Feature Level 11_0)
Shader Model	Shader Model 6.5
Vulkan	Vulkan 1.2
Media Engine
Encode codecs	H.264
Decode codecs	H.264; H.265 ;
Encoder(s) supported	NVENC
Display outputs	DVI ; DisplayPort 1.2 ; HDMI 1.4a
History
Predecessor	Fermi
Successor	Maxwell
Support status
	Unsupported

Last updated January 27, 2025

Portrait of Johannes Kepler, eponym of architecture JKepler.jpg — Portrait of Johannes Kepler, eponym of architecture

Kepler is the codename for a GPU microarchitecture developed by Nvidia, first introduced at retail in April 2012,^[1] as the successor to the Fermi microarchitecture. Kepler was Nvidia's first microarchitecture to focus on energy efficiency. Most GeForce 600 series, most GeForce 700 series, and some GeForce 800M series GPUs were based on Kepler, all manufactured in 28 nm. Kepler found use in the GK20A, the GPU component of the Tegra K1 SoC, and in the Quadro Kxxx series, the Quadro NVS 510, and Tesla computing modules.

Overview
Features
Next Generation Streaming Multiprocessor (SMX)
Simplified Instruction Scheduler
GPU Boost
Microsoft Direct3D Support
Next Microsoft Direct3D Support
TXAA Support
Shuffle Instructions
Hyper-Q
Dynamic Parallelism
Grid Management Unit
Nvidia GPUDirect
Video decompression/compression
Performance
Kepler dies
See also
References

Kepler was followed by the Maxwell microarchitecture and used alongside Maxwell in the GeForce 700 series and GeForce 800M series.

The architecture is named after Johannes Kepler, a German mathematician and key figure in the 17th century scientific revolution.

Overview

The goal of Nvidia's previous architecture was design focused on increasing performance on compute and tessellation. With the Kepler architecture, Nvidia targeted their focus on efficiency, programmability, and performance.^[2]^[3] The efficiency aim was achieved through the use of a unified GPU clock, simplified static scheduling of instruction and higher emphasis on performance per watt.^[4] By abandoning the shader clock found in their previous GPU designs, efficiency is increased, even though it requires additional cores to achieve higher levels of performance. This is not only because the cores are more power-friendly (two Kepler cores using 90% power of one Fermi core, according to Nvidia's numbers), but also the change to a unified GPU clock scheme delivers a 50% reduction in power consumption in that area.^[5]

Programmability aim was achieved with Kepler's Hyper-Q, Dynamic Parallelism and multiple new Compute Capabilities 3.x functionality. With it, higher GPU utilization and simplified code management was achievable with GK GPUs thus enabling more flexibility in programming for Kepler GPUs.^[6]

Finally with the performance aim, additional execution resources (more CUDA cores, registers and cache) and with Kepler's ability to achieve a memory clock speed of 7 GHz, increases Kepler's performance when compared to previous Nvidia GPUs.^[5]^[7]

Features

The GK Series GPU contains features from both the older Fermi and newer Kepler generations. Kepler based members add the following standard features:

PCI Express 3.0 interface
DisplayPort 1.2
HDMI 1.4a 4K x 2K video output
PureVideo VP5 hardware video acceleration (up to 4K x 2K H.264 decode)
Hardware H.265 decoding^[8]
Hardware H.264 encoding acceleration block (NVENC)
Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)
Next Generation Streaming Multiprocessor (SMX)
Polymorph-Engine 2.0
Simplified Instruction Scheduler
Bindless Textures
CUDA Compute Capability 3.0 to 3.5
GPU Boost (Upgraded to 2.0 on GK110)
TXAA Support
Manufactured by TSMC on a 28 nm process
New Shuffle Instructions
Dynamic Parallelism
Hyper-Q (Hyper-Q's MPI functionality reserve for Tesla only)
Grid Management Unit
Nvidia GPUDirect (GPU Direct's RDMA functionality reserve for Tesla only)

Next Generation Streaming Multiprocessor (SMX)

Kepler employs a new streaming multiprocessor architecture called SMX. CUDA execution core counts were increased from 32 per each of 16 SMs to 192 per each of 8 SMX; the register file was only doubled per SMX to 65,536 x 32-bit for an overall lower ratio; between this and other compromises, despite the 3x overall increase in CUDA cores and clock increase (on the 680 vs. the Fermi 580), the actual performance gains in most operations were well under 3x. Dedicated FP64 CUDA cores are used rather than treating two FP32 cores as a single unit as was done previously, and very few were included on the consumer models resulting in 1/24th speed FP64 calculation compared to FP32.^[9]

On the HPC models, the GK110/210, the SMX count was raised to 13-15 depending on the product, and more FP64 cores were included to bring the compute ratio up to 1/3rd FP32. On the GK110, per-thread register limit was quadrupled over fermi to 255, but this still only allows a thread using half of the registers to parallelize to 1/4 of each SMX. The GK210 (released at the same time) increased the register limit to 512 to improve performance in high register pressure situations like this. Texture cache, which programmers had already been using for compute as a read-only buffer in previous generations, was increased in size and the data path optimized for faster throughput when using this method. All levels of memory including the register file are single-bit ECC as well.

Another notable feature is that while Fermi GPUs could only be accessed by one CPU thread at a time, the HPC Kepler GPUs added multithreading support so high core count processors could open 32 connections and more easily saturate the compute capability.^[10]

Simplified Instruction Scheduler

Additional die space reduction and power saving was achieved by removing a complex hardware block that handled the prevention of data hazards.^[3]^[5]^[11]^[12]

GPU Boost

GPU Boost is a new feature which is roughly analogous to turbo boosting of a CPU. The GPU is always guaranteed to run at a minimum clock speed, referred to as the "base clock". This clock speed is set to the level which will ensure that the GPU stays within TDP specifications, even at maximum loads.^[3] When loads are lower, however, there is room for the clock speed to be increased without exceeding the TDP. In these scenarios, GPU Boost will gradually increase the clock speed in steps, until the GPU reaches a predefined power target of 170W by default (on the 680 card).^[5] By taking this approach, the GPU will ramp its clock up or down dynamically, so that it is providing the maximum amount of speed possible while remaining within TDP specifications.

The power target, as well as the size of the clock increase steps that the GPU will take, are both adjustable via third-party utilities and provide a means of overclocking Kepler-based cards.^[3]

Microsoft Direct3D Support

Nvidia Fermi and Kepler GPUs in the GeForce 600 series support the Direct3D 11.0 specification. Nvidia originally stated that the Kepler architecture has full DirectX 11.1 support, which includes the Direct3D 11.1 path.^[13] The following "Modern UI" Direct3D 11.1 features, however, are not supported:^[14]^[15]

Target-Independent Rasterization (2D rendering only).
16xMSAA Rasterization (2D rendering only).
Orthogonal Line Rendering Mode.
UAV (Unordered Access View) in non-pixel-shader stages.

According to the definition by Microsoft, Direct3D feature level 11_1 must be complete, otherwise the Direct3D 11.1 path can not be executed.^[16] The integrated Direct3D features of the Kepler architecture are the same as those of the GeForce 400 series Fermi architecture.^[15]

Next Microsoft Direct3D Support

Nvidia Kepler GPUs of the GeForce 600/700 series support Direct3D 12 feature level 11_0.^[17]

TXAA Support

Exclusive to Kepler GPUs, TXAA is a new anti-aliasing method from Nvidia that is designed for direct implementation into game engines. TXAA is based on the MSAA technique and custom resolve filters. It is designed to address a key problem in games known as shimmering or temporal aliasing. TXAA resolves that by smoothing out the scene in motion, making sure that any in-game scene is being cleared of any aliasing and shimmering.^[3]

Shuffle Instructions

The GK110 had a small number of instructions added to further improve performance. New shuffle instructions allow for threads within a warp to share data amongst themselves with an instruction that completes the normal store and load operations that previously required two accesses to local memory within one instruction, making the process around 6% faster than using local data storage. Atomic operations were also improved, with 9x increases in speed for some instructions and the addition of more atomic 64-bit operations, namely min, max, and, or, and xor.^[11]

Hyper-Q

Hyper-Q expands GK110 hardware work queues from 1 to 32. The significance of this being that having a single work queue meant that Fermi could be under occupied at times as there wasn't enough work in that queue to fill every SM. By having 32 work queues, GK110 can in many scenarios, achieve higher utilization by being able to put different task streams on what would otherwise be an idle SMX. The simple nature of Hyper-Q is further reinforced by the fact that it's easily mapped to MPI, a common message passing interface frequently used in HPC. As legacy MPI-based algorithms that were originally designed for multi-CPU systems that became bottlenecked by false dependencies now have a solution. By increasing the number of MPI jobs, it's possible to utilize Hyper-Q on these algorithms to improve the efficiency all without changing the code itself.^[11]

Dynamic Parallelism

Dynamic Parallelism ability is for kernels to be able to dispatch other kernels. With Fermi, only the CPU could dispatch a kernel, which incurs a certain amount of overhead by having to communicate back to the CPU. By giving kernels the ability to dispatch their own child kernels, GK110 can both save time by not having to go back to the CPU, and in the process free up the CPU to work on other tasks.^[11]

Grid Management Unit

Enabling Dynamic Parallelism requires a new grid management and dispatch control system. The new Grid Management Unit (GMU) manages and prioritizes grids to be executed. The GMU can pause the dispatch of new grids and queue pending and suspended grids until they are ready to execute, providing the flexibility to enable powerful runtimes, such as Dynamic Parallelism. The CUDA Work Distributor in Kepler holds grids that are ready to dispatch, and is able to dispatch 32 active grids, which is double the capacity of the Fermi CWD. The Kepler CWD communicates with the GMU via a bidirectional link that allows the GMU to pause the dispatch of new grids and to hold pending and suspended grids until needed. The GMU also has a direct connection to the Kepler SMX units to permit grids that launch additional work on the GPU via Dynamic Parallelism to send the new work back to GMU to be prioritized and dispatched. If the kernel that dispatched the additional workload pauses, the GMU will hold it inactive until the dependent work has completed.^[12]

Nvidia GPUDirect

Nvidia GPUDirect is a capability that enables GPUs within a single computer, or GPUs in different servers located across a network, to directly exchange data without needing to go to CPU/system memory. The RDMA feature in GPUDirect allows third party devices such as SSDs, NICs, and IB adapters to directly access memory on multiple GPUs within the same system, significantly decreasing the latency of MPI send and receive messages to/from GPU memory.^[18] It also reduces demands on system memory bandwidth and frees the GPU DMA engines for use by other CUDA tasks. The Kepler GK110 die also supports other GPUDirect features including Peer‐to‐Peer and GPUDirect for Video.

Video decompression/compression

NVDEC

NVENC

NVENC is Nvidia's power efficient fixed-function encode that is able to take codecs, decode, preprocess, and encode H.264-based content. NVENC specification input formats are limited to H.264 output. But still, NVENC, through its limited format, can support up to 4096x4096 encode.^[19]

Like Intel's QuickSync, NVENC is currently exposed through a proprietary API, though Nvidia does have plans to provide NVENC usage through CUDA.^[19]

Performance

The theoretical single-precision processing power of a Kepler GPU in GFLOPS is computed as 2 (operations per FMA instruction per CUDA core per cycle) × number of CUDA cores × core clock speed (in GHz). Note that like the previous generation Fermi, Kepler is not able to benefit from increased processing power by dual-issuing MAD+MUL like Tesla was capable of.

The theoretical double-precision processing power of a Kepler GK110/210 GPU is 1/3 of its single precision performance. This double-precision processing power is however only available on professional Quadro, Tesla, and high-end Titan-branded GeForce cards, while drivers for consumer GeForce cards limit the performance to 1/24 of the single precision performance.^[20] The lower performance GK10x dies are similarly capped to 1/24 of the single precision performance.^[21]

Kepler dies

Kepler

		GK104	GK106	GK107	GK110
Variant(s)		GK104-200-A2 GK104-300-A2 GK104-325-A2 GK104-400-A2 GK104-425-A2 GK104-850-A2	GK106-240-A1 GK107-400-A1	GK107-300-A2 GK107-301-A2 GK107-320-A2 GK107-400-A2 GK107-425-A2 GK107-450-A2 GK107-810-A2	GK110-300-A1 GK110-400-A1 GK110-425-B1 GK110-885-A1
Release date		Apr 3, 2012	Sep 6, 2012	Sep 6, 2012	Nov 12, 2012
Cores	CUDA Cores	1536	960	384	2880
	TMUs	128	80	32	240
	ROPs	32	24	16	48
Streaming Multiprocessors		8	5	2	15
GPCs		4	3	1	5
Cache	L1	128 KB	80 KB	32 KB	240 KB
Cache	L2	512 KB	512 KB	256 KB	1.5 MB
Memory interface		256-bit	192-bit	192-bit	384-bit
Die size		294 mm²	221 mm²	118 mm²	561 mm²
Transistor count		3.54 bn.	2.54 bn.	1.27 bn.	7.08 bn.
Transistor density		12.0 MTr/mm²	11.5 MTr/mm²	10.8 MTr/mm²	12.6 MTr/mm²
Package socket		BGA 1745	BGA 1425	BGA 908	BGA 2152
Products
Consumer	Desktop	GTX 660 GTX 660 Ti GTX 670 GTX 680 GTX 690 GTX 760 GTX 760 Ti GTX 770	GTX 650 GTX 650 Ti GTX 660	GT 630 GTX 650 GT 720 GT 730 GT 740 GT 1030	GTX 780 GTX Titan
Consumer	Mobile	GTX 670MX GTX 675MX GTX 680M GTX 680MX GTX 775M GTX 780M GTX 860M GTX 870M GTX 880M	GTX 765M GTX 770M	GT 640M GTX 640M LE GT 645M GT 650M GTX 660M GT 740M GT 745M GT 750M GT 755M GTX 810M GTX 820M	—
Workstation	Desktop	Quadro K4200 Quadro K5000	Quadro K4000 Quadro K5000	Quadro K410 Quadro K420 Quadro K600 Quadro K2000 Quadro K2000D	Quadro K5200 Quadro K6000
Workstation	Mobile	Quadro K3000M Quadro K3100M Quadro K4000M Quadro K4100M Quadro K5000M Quadro K5100M	—	Quadro K100M Quadro K200M Quadro K500M Quadro K1000M Quadro K1100M Quadro K2000M	—

Kepler 2.0

GK208
GK210
GK20A (Tegra K1)

Related Research Articles

GeForce is a brand of graphics processing units (GPUs) designed by Nvidia and marketed for the performance market. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, intended for the high-margin PC gaming market, and later diversification of the product line covered all tiers of the PC graphics market, ranging from cost-sensitive GPUs integrated on motherboards to mainstream add-in retail boards. Most recently, GeForce technology has been introduced into Nvidia's line of embedded application processors, designed for electronic handhelds and mobile handsets.

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

Quadro was Nvidia's brand for graphics cards intended for use in workstations running professional computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC) applications, scientific calculations and machine learning from 2000 to 2020.

nouveau is a free and open-source graphics device driver for Nvidia video cards and the Tegra family of SoCs written by independent software engineers, with minor help from Nvidia employees.

In the field of 3D computer graphics, the unified shader model refers to a form of shader hardware in a graphical processing unit (GPU) where all of the shader stages in the rendering pipeline have the same capabilities. They can all read textures and buffers, and they use instruction sets that are almost identical.

Tesla is the codename for a GPU microarchitecture developed by Nvidia, and released in 2006, as the successor to Curie microarchitecture. It was named after the pioneering electrical engineer Nikola Tesla. As Nvidia's first microarchitecture to implement unified shaders, it was used with GeForce 8 series, GeForce 9 series, GeForce 100 series, GeForce 200 series, and GeForce 300 series of GPUs, collectively manufactured in 90 nm, 80 nm, 65 nm, 55 nm, and 40 nm. It was also in the GeForce 405 and in the Quadro FX, Quadro x000, Quadro NVS series, and Nvidia Tesla computing modules.

The GeForce 500 series is a series of graphics processing units developed by Nvidia, as a refresh of the Fermi based GeForce 400 series. It was first released on November 9, 2010 with the GeForce GTX 580.

The GeForce 600 series is a series of graphics processing units developed by Nvidia, first released in 2012. It served as the introduction of the Kepler architecture. It is succeeded by the GeForce 700 series.

The GeForce 700 series is a series of graphics processing units developed by Nvidia. While mainly a refresh of the Kepler microarchitecture, some cards use Fermi (GF) and later cards use Maxwell (GM). GeForce 700 series cards were first released in 2013, starting with the release of the GeForce GTX Titan on February 19, 2013, followed by the GeForce GTX 780 on May 23, 2013. The first mobile GeForce 700 series chips were released in April 2013.

Fermi is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia, first released to retail in April 2010, as the successor to the Tesla microarchitecture. It was the primary microarchitecture used in the GeForce 400 series and 500 series. All desktop Fermi GPUs were manufactured in 40nm, mobile Fermi GPUs in 40nm and 28nm. Fermi is the oldest microarchitecture from Nvidia that receives support for Microsoft's rendering API Direct3D 12 feature_level 11.

The GeForce 800M series is a family of graphics processing units by Nvidia for laptop PCs. It consists of rebrands of mobile versions of the GeForce 700 series and some newer chips that are lower end compared to the rebrands.

<span class="mw-page-title-main">GeForce 900 series</span> Series of GPUs by Nvidia

The GeForce 900 series is a family of graphics processing units developed by Nvidia, succeeding the GeForce 700 series and serving as the high-end introduction to the Maxwell microarchitecture, named after James Clerk Maxwell. They were produced with TSMC's 28 nm process.

The GeForce 10 series is a series of graphics processing units developed by Nvidia, initially based on the Pascal microarchitecture announced in March 2014. This design series succeeded the GeForce 900 series, and is succeeded by the GeForce 16 series and GeForce 20 series using the Turing microarchitecture.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">Maxwell (microarchitecture)</span> GPU microarchitecture by Nvidia

Maxwell is the codename for a GPU microarchitecture developed by Nvidia as the successor to the Kepler microarchitecture. The Maxwell architecture was introduced in later models of the GeForce 700 series and is also used in the GeForce 800M series, GeForce 900 series, and Quadro Mxxx series, as well as some Jetson products.

<span class="mw-page-title-main">Pascal (microarchitecture)</span> GPU microarchitecture by Nvidia

Pascal is the codename for a GPU microarchitecture developed by Nvidia, as the successor to the Maxwell architecture. The architecture was first introduced in April 2016 with the release of the Tesla P100 (GP100) on April 5, 2016, and is primarily used in the GeForce 10 series, starting with the GeForce GTX 1080 and GTX 1070, which were released on May 27, 2016, and June 10, 2016, respectively. Pascal was manufactured using TSMC's 16 nm FinFET process, and later Samsung's 14 nm FinFET process.

Nvidia NVDEC is a feature in its graphics cards that performs video decoding, offloading this compute-intensive task from the CPU. NVDEC is a successor of PureVideo and is available in Kepler and later Nvidia GPUs.

Turing is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is named after the prominent mathematician and computer scientist Alan Turing. The architecture was first introduced in August 2018 at SIGGRAPH 2018 in the workstation-oriented Quadro RTX cards, and one week later at Gamescom in consumer GeForce 20 series graphics cards. Building on the preliminary work of Volta, its HPC-exclusive predecessor, the Turing architecture introduces the first consumer products capable of real-time ray tracing, a longstanding goal of the computer graphics industry. Key elements include dedicated artificial intelligence processors and dedicated ray tracing processors. Turing leverages DXR, OptiX, and Vulkan for access to ray tracing. In February 2019, Nvidia released the GeForce 16 series GPUs, which utilizes the new Turing design but lacks the RT and Tensor cores.

Ada Lovelace, also referred to simply as Lovelace, is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Ampere architecture, officially announced on September 20, 2022. It is named after the English mathematician Ada Lovelace, one of the first computer programmers. Nvidia announced the architecture along with the GeForce RTX 40 series consumer GPUs and the RTX 6000 Ada Generation workstation graphics card. The Lovelace architecture is fabricated on TSMC's custom 4N process which offers increased efficiency over the previous Samsung 8 nm and TSMC N7 processes used by Nvidia for its previous-generation Ampere architecture.

References

↑ Mujtaba, Hassan (18 February 2012). "Nvidia Expected to launch Eight New 28nm Kepler GPU's in April 2012".
↑ "Inside Kepler" (PDF). Retrieved 2015-09-19.
1 2 3 4 5 "Introducing The GeForce GTX 680 GPU". Nvidia. March 22, 2012. Retrieved 2015-09-19.
↑ "Nvidia's Next Generation CUDA Compute Architecture: Kepler TM GK110" (PDF). Nvidia.
1 2 3 4 Smith, Ryan (March 22, 2012). "Nvidia GeForce GTX 680 Review: Retaking The Performance Crown". AnandTech. Retrieved November 25, 2012.
↑ "Efficiency Through Hyper-Q, Dynamic Parallelism, & More". Nvidia. November 12, 2012. Retrieved 2015-09-19.
↑ "GeForce GTX 770 | Specifications | GeForce". Nvidia. Retrieved 2022-06-07.
↑ "NVIDIA GPU Decoder Device Information".
↑ "GeForce 680 (Kepler) Whitepaper" (PDF). Nvidia. Retrieved March 22, 2024.
↑ "Nvidia Kepler GK210/110 Architecture White Paper" (PDF). Nvidia. Retrieved 22 March 2024.
1 2 3 4 Smith, Ryan (November 12, 2012). "Nvidia Launches Tesla K20 & K20X: GK110 Arrives At Last". AnandTech. Retrieved September 19, 2015.
1 2 "Nvidia Kepler GK110 Architecture Whitepaper" (PDF). Nvidia. Retrieved 2015-09-19.
↑ "Nvidia Launches First GeForce GPUs Based on Next-Generation Kepler Architecture". Nvidia. March 22, 2012. Archived from the original on June 14, 2013.
↑ Edward, James (November 22, 2012). "Nvidia claims partially support DirectX 11.1". TechNews. Archived from the original on June 28, 2015. Retrieved 2015-09-19.
1 2 "Nvidia Doesn't Fully Support DirectX 11.1 with Kepler GPUs, But… (Web Archive Link)". BSN. Archived from the original on December 29, 2012.
↑ "D3D_FEATURE_LEVEL enumeration (Windows)". MSDN. Retrieved 2015-09-19.
↑ Moreton, Henry (March 20, 2014). "DirectX 12: A Major Stride for Gaming". Nvidia. Retrieved 2015-09-19.
↑ "Nvidia GPUDirect". Nvidia Developer. October 6, 2015. Retrieved February 5, 2019.
1 2 Angelini, Chris (March 22, 2012). "Benchmark Results: NVEnc And MediaEspresso 6.5". Tom’s Hardware. Retrieved September 19, 2015.
↑ Angelini, Chris (November 7, 2013). "Nvidia GeForce GTX 780 Ti Review: GK110, Fully Unlocked". Tom's Hardware. p. 1. Retrieved December 6, 2015. The card's driver deliberately operates GK110's FP64 units at 1/8 of the GPU's clock rate. When you multiply that by the 3:1 ratio of single- to double-precision CUDA cores, you get a 1/24 rate
↑ Smith, Ryan (13 September 2012). "The Nvidia GeForce GTX 660 Review: GK106 Fills Out The Kepler Family". AnandTech. p. 1. Retrieved 6 December 2015.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Mujtaba, Hassan (18 February 2012). "Nvidia Expected to launch Eight New 28nm Kepler GPU's in April 2012".

[2] "Inside Kepler" (PDF). Retrieved 2015-09-19.

[gtx680-nvidia-paper-3] 1 2 3 4 5 "Introducing The GeForce GTX 680 GPU". Nvidia. March 22, 2012. Retrieved 2015-09-19.

[4] "Nvidia's Next Generation CUDA Compute Architecture: Kepler TM GK110" (PDF). Nvidia.

[anandtech-GTX680-review-5] 1 2 3 4 Smith, Ryan (March 22, 2012). "Nvidia GeForce GTX 680 Review: Retaking The Performance Crown". AnandTech. Retrieved November 25, 2012.

[6] "Efficiency Through Hyper-Q, Dynamic Parallelism, & More". Nvidia. November 12, 2012. Retrieved 2015-09-19.

[7] "GeForce GTX 770 | Specifications | GeForce". Nvidia. Retrieved 2022-06-07.

[8] "NVIDIA GPU Decoder Device Information".

[9] "GeForce 680 (Kepler) Whitepaper" (PDF). Nvidia. Retrieved March 22, 2024.

[10] "Nvidia Kepler GK210/110 Architecture White Paper" (PDF). Nvidia. Retrieved 22 March 2024.

[anandtech-GK110-preview-11] 1 2 3 4 Smith, Ryan (November 12, 2012). "Nvidia Launches Tesla K20 & K20X: GK110 Arrives At Last". AnandTech. Retrieved September 19, 2015.

[nvidia-12] 1 2 "Nvidia Kepler GK110 Architecture Whitepaper" (PDF). Nvidia. Retrieved 2015-09-19.

[13] "Nvidia Launches First GeForce GPUs Based on Next-Generation Kepler Architecture". Nvidia. March 22, 2012. Archived from the original on June 14, 2013.

[14] Edward, James (November 22, 2012). "Nvidia claims partially support DirectX 11.1". TechNews. Archived from the original on June 28, 2015. Retrieved 2015-09-19.

[Nvidia/D3D11.1-15] 1 2 "Nvidia Doesn't Fully Support DirectX 11.1 with Kepler GPUs, But… (Web Archive Link)". BSN. Archived from the original on December 29, 2012.

[16] "D3D_FEATURE_LEVEL enumeration (Windows)". MSDN. Retrieved 2015-09-19.

[17] Moreton, Henry (March 20, 2014). "DirectX 12: A Major Stride for Gaming". Nvidia. Retrieved 2015-09-19.

[18] "Nvidia GPUDirect". Nvidia Developer. October 6, 2015. Retrieved February 5, 2019.

[Tom’s_Hardware-19] 1 2 Angelini, Chris (March 22, 2012). "Benchmark Results: NVEnc And MediaEspresso 6.5". Tom’s Hardware. Retrieved September 19, 2015.

[20] Angelini, Chris (November 7, 2013). "Nvidia GeForce GTX 780 Ti Review: GK110, Fully Unlocked". Tom's Hardware. p. 1. Retrieved December 6, 2015. The card's driver deliberately operates GK110's FP64 units at 1/8 of the GPU's clock rate. When you multiply that by the 3:1 ratio of single- to double-precision CUDA cores, you get a 1/24 rate

[21] Smith, Ryan (13 September 2012). "The Nvidia GeForce GTX 660 Review: GK106 Fills Out The Kepler Family". AnandTech. p. 1. Retrieved 6 December 2015.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Product Series
Launched	April 3, 2012 (2012-04-03)
Designed by	Nvidia
Manufactured by	TSMC
Fabrication process	TSMC 28 nm
Desktop	GeForce 600 series GeForce 700 series
Professional/workstation	Quadro K
Server/datacenter	Tesla K
Specifications
L1 cache	16 KB (per SM)
L2 cache	Up to 512 KB
Memory support	GDDR5
PCIe support	PCIe 2.0 PCIe 3.0
Supported Graphics APIs
DirectX	DirectX 12 Ultimate (Feature Level 11_0)
Shader Model	Shader Model 6.5
Vulkan	Vulkan 1.2
Media Engine
Encode codecs	H.264
Decode codecs	H.264 H.265
Encoder(s) supported	NVENC
Display outputs	DVI DisplayPort 1.2 HDMI 1.4a
History
Predecessor	Fermi
Successor	Maxwell
Support status
Unsupported