Project Denver

Last updated
Nvidia Denver 1/2
General information
Launched2014 (Denver)
2016 (Denver 2)
Designed by Nvidia
Cache
L1 cache 192  KiB per core
(128 KiB I-cache with parity, 64 KiB D-cache with ECC)
L2 cache2  MiB @ 2 cores
Architecture and classification
Technology node 28 nm (Denver 1) to 16 nm (Denver 2)
Instruction set ARMv8-A
Physical specifications
Cores
  • 2
Nvidia Carmel
General information
Launched2018
Designed by Nvidia
Max. CPU clock rate to 2.3 GHz 
Cache
L1 cache 192  KiB per core
(128 KiB I-cache with parity, 64 KiB D-cache with ECC)
L2 cache2  MiB @ 2 cores
L3 cache(4  MiB @ 8 cores, T194 [1] )
Architecture and classification
Technology node 12 nm
Instruction set ARMv8.2-A
Physical specifications
Cores
  • 2

Project Denver is the codename of a central processing unit designed by Nvidia that implements the ARMv8-A 64/32-bit instruction sets using a combination of simple hardware decoder and software-based binary translation (dynamic recompilation) where "Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128 MB cache stored in main memory". [2] Denver is a very wide in-order superscalar pipeline. Its design makes it suitable for integration with other SIPs cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).

Contents

Project Denver is targeted at mobile computers, personal computers, servers, as well as supercomputers. [3] Respective cores have found integration in the Tegra SoC series from Nvidia. Initially Denver cores was designed for the 28 nm process node (Tegra model T132 aka "Tegra K1"). Denver 2 was an improved design that built for the smaller, more efficient 16 nm node. (Tegra model T186 aka "Tegra X2").

In 2018, Nvidia released an improved design (codename: "Carmel", based on ARMv8 (64-bit; variant: ARM-v8.2 [4] with 10-way superscalar, functional safety, dual execution, parity & ECC) got integrated into the Tegra Xavier SoC offering a total of 8 cores (or 4 dual-core pairs). [5] [ failed verification ] The Carmel CPU core supports full Advanced SIMD (ARM NEON), VFP (Vector Floating Point), and ARMv8.2-FP16. [6] First published testings of Carmel cores integrated in the Jetson AGX development kit by third party experts took place in September 2018 and indicated a noticeably increased performance as should expected for this real world physical manifestation compared to predecessors systems, despite all doubts the used quickness of such a test setup in general an in particular implies. [7] The Carmel design can be found in the Tegra model T194 ("Tegra Xavier") that is designed with a 12 nm structure size.

Overview

Chips

A dual-core Denver CPU was paired with a Kepler-based GPU solution to form the Tegra K1; the dual-core 2.3 GHz Denver-based K1 was first used in the HTC Nexus 9 tablet, released November 3, 2014. [10] [11] Note, however, that the quad-core Tegra K1, while using the same name, isn't based on Denver.

The Nvidia Tegra X2 has two Denver2 (ARMv8 64bit) cores inside and another four A57 (ARMv8 64bit) cores using a coherent HMP (Heterogeneous Multi-Processor Architecture) approach. [12] This pairs the units with a Parker-GPU.

The Tegra Xavier is pairing an Nvidia Volta-GPU and several special purpose accelerators with 8 CPU cores with the Carmel design. In this design 4 Carmel ASIC macro blocks (with each having 2 cores) are matched to each other with one more crossbar and 4 MiB of L3 memory.

History

The existence of Project Denver was revealed at the 2011 Consumer Electronics Show. [13] In a March 4, 2011 Q&A article CEO Jen-Hsun Huang revealed that Project Denver is a five-year 64-bit ARMv8-A architecture CPU development on which hundreds of engineers had already worked for three and half years and which also has 32-bit ARM instruction set (ARMv7) backward compatibility. [14] Project Denver was started in Stexar Company (Colorado) as an x86-compatible processor using binary translation, similar to projects by Transmeta. Stexar was acquired by Nvidia in 2006. [15] [16] [17]

According to Tom's Hardware, there are engineers from Intel, AMD, HP, Sun and Transmeta on the Denver team, and they have extensive experience designing superscalar CPUs with out-of-order execution, very long instruction words (VLIW) and simultaneous multithreading (SMT). [18]

According to Charlie Demerjian, the Project Denver CPU may internally translate the ARM instructions to an internal instruction set, using firmware in the CPU. [19] Also according to Demerjian, Project Denver was originally intended to support both ARM and x86 code using code morphing technology from Transmeta, but was changed to the ARMv8-A 64-bit instruction set because Nvidia could not obtain a license to Intel's patents. [19]

The first consumer device shipping with Denver CPU cores, Google's Nexus 9, was announced on October 15, 2014. The tablet was manufactured by HTC and features the dual-core Tegra K1 SoC. The Nexus 9 was the first 64-bit Android device available to consumers. [20]

See also

Related Research Articles

Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel, whereas conventional central processing units (CPUs) mostly allow programs to specify instructions to execute in sequence only. VLIW is intended to allow higher performance without the complexity inherent in some other designs.

ARM is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Ltd. develops the ISAs and licenses them to other companies, who build the physical devices that use the instruction set. It also designs and licenses cores that implement these ISAs.

<span class="mw-page-title-main">Mobile processor</span>

A mobile processor is a microprocessor designed for mobile devices such as laptops, and cell phones.

<span class="mw-page-title-main">Free and open-source graphics device driver</span> Software that controls computer-graphics hardware

A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.

<span class="mw-page-title-main">Larrabee (microarchitecture)</span>

Larrabee is the codename for a cancelled GPGPU chip that Intel was developing separately from its current line of integrated graphics accelerators. It is named after either Mount Larrabee or Larrabee State Park in Whatcom County, Washington, near the town of Bellingham. The chip was to be released in 2010 as the core of a consumer 3D graphics card, but these plans were cancelled due to delays and disappointing early performance figures. The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010 and its technology was passed on to the Xeon Phi. The Intel MIC multiprocessor architecture announced in 2010 inherited many design elements from the Larrabee project, but does not function as a graphics processing unit; the product is intended as a co-processor for high performance computing.

<span class="mw-page-title-main">Tegra</span> System on a chip by Nvidia

Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.

<span class="mw-page-title-main">ARM Cortex-A15</span> Family of microprocessor cores with ARM microarchitecture

The ARM Cortex-A15 MPCore is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture. It is a multicore processor with out-of-order superscalar pipeline running at up to 2.5 GHz.

Arch Linux ARM is a port of Arch Linux for ARM processors. Its design philosophy is "simplicity and full control to the end user," and like its parent operating system Arch Linux, aims to be very Unix-like. This goal of minimalism and complete user control, however, can make it difficult for Linux beginners as it requires more knowledge of and responsibility for the operating system.

In computer architecture, 256-bit integers, memory addresses, or other data units are those that are 256 bits wide. Also, 256-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers, address buses, or data buses of that size. There are currently no mainstream general-purpose processors built to operate on 256-bit integers or addresses, though a number of processors do operate on 256-bit data.

The ARM Cortex-A57 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings. The Cortex-A57 is an out-of-order superscalar pipeline. It is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores into one die constituting a system on a chip (SoC).@

This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.

The ARM Cortex-A72 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings' Austin design centre. The Cortex-A72 is a 3-way decode out-of-order superscalar pipeline. It is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores into one die constituting a system on a chip (SoC). The Cortex-A72 was announced in 2015 to serve as the successor of the Cortex-A57, and was designed to use 20% less power or offer 90% greater performance.

Nvidia Drive is a computer platform by Nvidia, aimed at providing autonomous car and driver assistance functionality powered by deep learning. The platform was introduced at the Consumer Electronics Show (CES) in Las Vegas in January 2015. An enhanced version, the Drive PX 2 was introduced at CES a year later, in January 2016.

Nvidia Jetson is a series of embedded computing boards from Nvidia. The Jetson TK1, TX1 and TX2 models all carry a Tegra processor from Nvidia that integrates an ARM architecture central processing unit (CPU). Jetson is a low-power system and is designed for accelerating machine learning applications.

The ARM Cortex-A75 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings's Sophia design centre. The Cortex-A75 is a 3-wide decode out-of-order superscalar pipeline. The Cortex-A75 serves as the successor of the Cortex-A73, designed to improve performance by 20% over the A73 in mobile applications while maintaining the same efficiency.

The ARM Cortex-A76 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM states a 25% and 35% increase in integer and floating point performance, respectively, over a Cortex-A75 of the previous generation.

The NVIDIA Deep Learning Accelerator (NVDLA) is an open-source hardware neural network AI accelerator created by Nvidia. The accelerator is written in Verilog and is configurable and scalable to meet many different architecture needs. NVDLA is merely an accelerator and any process must be scheduled and arbitered by an outside entity such as a CPU.

The ARM Cortex-A78 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Ltd.'s Austin centre.

The ARM Cortex-X1 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre as part of ARM's Cortex-X Custom (CXC) program.

References

  1. NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics by Dustin Franklin (Nvidia development team for Jetson), December 12, 2018
  2. 1 2 Wasson, Scott (August 11, 2014). "Nvidia claims Haswell-class performance for Denver CPU core". The Tech Report . Retrieved August 14, 2014.
  3. Dally, Bill (January 5, 2011). ""PROJECT DENVER" PROCESSOR TO USHER IN NEW ERA OF COMPUTING". Official Nvidia blog.
  4. NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics by Dustin Franklin (Nvidia development team for Jetson), December 12, 2018
  5. NVIDIA Drive Xavier SOC Detailed by Hassan Mujtaba on Jan 8, 2018 via WccfTech
  6. NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics by Dustin Franklin (Nvidia development team for Jetson), December 12, 2018
  7. "A Quick Test of NVIDIA's "Carmel" CPU Performance".
  8. Hachman, Mark (August 11, 2014). "Nvidia reveals PC-like performance for 'Denver' Tegra K1". PC World. Retrieved September 19, 2014.
  9. Anthony, Sebastian (January 6, 2014). "Tegra K1 64-bit Denver core analysis: Are Nvidia's x86 efforts hidden within?". ExtremeTech. Retrieved January 7, 2014.
  10. "Nexus 9 storms through Geekbench, Tegra K1 outperforms Apple iPhone 6's A8". 16 October 2014.
  11. Shimpi, Anand (January 5, 2014). "NVIDIA Announces Tegra K1 SoC with Optional Denver CPU Cores". Anandtech. Retrieved January 6, 2014.
  12. NVIDIA Unveils Tegra Parker SOC at Hot Chips – Built on 16nm TSMC Process, Features Pascal and Denver 2 Duo Architecture, August 22, 2016
  13. http://www.nvidia.com/object/ces2011.html Nvidia's press conference webcast
  14. Takahashi, Dean (March 4, 2011). "Q&A: Nvidia chief explains his strategy for winning in mobile computing".
  15. Valich, Theo (December 12, 2011). "NVIDIA Project Denver "Lost in Rockies", to Debut in 2014-15".
  16. Miller, Paul (October 19, 2006). "NVIDIA has x86 CPU in the works?". Engadget. Retrieved October 19, 2013.
  17. Valich, Theo (March 20, 2013). "New Tegra Roadmap Reveals Logan, Parker and Kayla CUDA Strategy".
  18. Parrish, Kevin (October 14, 2013). "64-bit Nvidia Tegra 6 "Parker" Chip May Arrive in 2014. Devices with a 64-bit Tegra 6 could launch before the end of 2014". Tom's Hardware & ExtremeTech. Retrieved October 19, 2013.
  19. 1 2 Demerjian, Charlie (August 5, 2011). "What is Project Denver based on?". Semiaccurate.
  20. Amadeo, Ron (October 15, 2014). "Google announces Nexus 6, Nexus 9, Nexus Player, and Android 5.0 Lollipop".