Larrabee (microarchitecture)

Last updated
The Larrabee GPU architecture, unveiled at the SIGGRAPH conference in August 2008 Larrabee slide block diagram.jpg
The Larrabee GPU architecture, unveiled at the SIGGRAPH conference in August 2008

Larrabee is the codename for a cancelled GPGPU chip that Intel was developing separately from its current line of integrated graphics accelerators. It is named after either Mount Larrabee or Larrabee State Park in Whatcom County, Washington, near the town of Bellingham. [1] [2] The chip was to be released in 2010 as the core of a consumer 3D graphics card, but these plans were cancelled due to delays and disappointing early performance figures. [3] [4] The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010 [5] and its technology was passed on to the Xeon Phi. The Intel MIC multiprocessor architecture announced in 2010 inherited many design elements from the Larrabee project, but does not function as a graphics processing unit; the product is intended as a co-processor for high performance computing.

Contents

Almost a decade later, on June 12, 2018; the idea of an Intel dedicated GPU was revived again with Intel's desire to create a discrete GPU by 2020. [6] This project would eventually become the Intel Xe and Intel Arc series, released in September 2020 and March 2022, respectively - but both were unconnected to the work on the Larrabee project.

Project status

On December 4, 2009, Intel officially announced that the first-generation Larrabee would not be released as a consumer GPU product. [7] Instead, it was to be released as a development platform for graphics and high-performance computing. The official reason for the strategic reset was attributed to delays in hardware and software development. [8] On May 25, 2010, the Technology@Intel blog announced that Larrabee would not be released as a GPU, but instead would be released as a product for high-performance computing competing with the Nvidia Tesla. [9]

The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010. [5] The Intel MIC multiprocessor architecture announced in 2010 inherited many design elements from the Larrabee project, but does not function as a graphics processing unit; the product is intended as a co-processor for high performance computing. The prototype card was named Knights Ferry, a production card built at a 22 nm process named Knights Corner was planned for production in 2012 or later.[ citation needed ]

Comparison with competing products

According to Intel, Larrabee has a fully programmable pipeline, in contrast to current generation graphics cards which are only partially programmable. Slide convergence.jpg
According to Intel, Larrabee has a fully programmable pipeline, in contrast to current generation graphics cards which are only partially programmable.

Larrabee can be considered a hybrid between a multi-core CPU and a GPU, and has similarities to both. Its coherent cache hierarchy and x86 architecture compatibility are CPU-like, while its wide SIMD vector units and texture sampling hardware are GPU-like.

As a GPU, Larrabee would have supported traditional rasterized 3D graphics (Direct3D & OpenGL) for games. However, its hybridization of CPU and GPU features should also have been suitable for general purpose GPU (GPGPU) or stream processing tasks. For example, it might have performed ray tracing or physics processing, [10] in real time for games or offline for scientific research as a component of a supercomputer. [11]

Larrabee's early presentation drew some criticism from GPU competitors. At NVISION 08, an Nvidia employee called Intel's SIGGRAPH paper about Larrabee "marketing puff" and quoted an industry analyst (Peter Glaskowsky) who speculated that the Larrabee architecture was "like a GPU from 2006". [12] By June 2009, Intel claimed that prototypes of Larrabee were on par with the Nvidia GeForce GTX 285. [13] Justin Rattner, Intel CTO, delivered a keynote at the Supercomputing 2009 conference on November 17, 2009. During his talk he demonstrated an overclocked Larrabee processor topping one teraFLOPS in performance. He claimed this was the first public demonstration of a single-chip system exceeding one teraFLOPS. He pointed out this was early silicon, thereby leaving open the question on eventual performance for the architecture. Because this was only one fifth that of available competing graphics boards, Larrabee was cancelled "as a standalone discrete graphics product" on December 4, 2009. [3]

Differences with contemporary GPUs

Larrabee was intended to differ from older discrete GPUs such as the GeForce 200 series and the Radeon 4000 series in three major ways:

This had been expected to make Larrabee more flexible than current GPUs, allowing more differentiation in appearance between games or other 3D applications. Intel's SIGGRAPH 2008 paper mentioned several rendering features that were difficult to achieve on current GPUs: render target read, order-independent transparency, irregular shadow mapping, and real-time raytracing. [14]

More recent GPUs such as ATI's Radeon HD 5xxx and Nvidia's GeForce 400 series feature increasingly broad general-purpose computing capabilities via DirectX11 DirectCompute and OpenCL, as well as Nvidia's proprietary CUDA technology, giving them many of the capabilities of Larrabee.

Differences with CPUs

The x86 processor cores in Larrabee differed in several ways from the cores in current Intel CPUs such as the Core 2 Duo or Core i7:

Theoretically Larrabee's x86 processor cores would have been able to run existing PC software, or even operating systems. A different version of the processor might sit in motherboard CPU sockets using QuickPath, [17] but Intel never announced any plans for this. Though Larrabee's native C/C++ compiler included auto-vectorization and many applications were able to execute correctly after having been recompiled, maximum efficiency was expected to have required code optimization using C++ vector intrinsics or inline Larrabee assembly code. [14] However, as in all GPGPUs, not all software would have benefited from utilization of a vector processing unit. One tech journalism site claims that Larrabee's graphics capabilities were planned to be integrated in CPUs based on the Haswell microarchitecture. [18]

Comparison with the Cell broadband engine

Larrabee's philosophy of using many small, simple cores was similar to the ideas behind the Cell processor. There are some further commonalities, such as the use of a high-bandwidth ring bus to communicate between cores. [14] However, there were many significant differences in implementation which were expected to make programming Larrabee simpler.

Comparison with Intel GMA

Intel began integrating a line of GPUs onto motherboards under the Intel GMA brand in 2004. Being integrated onto motherboards (newer versions, such as those released with Sandy Bridge, are incorporated onto the same die as the CPU) these chips were not sold separately. Though the low cost and power consumption of Intel GMA chips made them suitable for small laptops and less demanding tasks, they lack the 3D graphics processing power to compete with contemporary Nvidia and AMD/ATI GPUs for a share of the high-end gaming computer market, the HPC market, or a place in popular video game consoles. In contrast, Larrabee was to be sold as a discrete GPU, separate from motherboards, and was expected to perform well enough for consideration in the next generation of video game consoles. [19] [20]

The team working on Larrabee was separate from the Intel GMA team. The hardware was designed by a newly formed team at Intel's Hillsboro, Oregon, site, separate from those that designed the Nehalem. The software and drivers were written by a newly formed team. The 3D stack specifically was written by developers at RAD Game Tools (including Michael Abrash). [21]

The Intel Visual Computing Institute will research basic and applied technologies that could be applied to Larrabee-based products. [22]

Projected performance data

Benchmarking results from the 2008
SIGGRAPH paper, showing predicted performance as an approximate linear function of the number of processing cores Slide scaling.jpg
Benchmarking results from the 2008 SIGGRAPH paper, showing predicted performance as an approximate linear function of the number of processing cores

Intel's SIGGRAPH 2008 paper describes cycle-accurate simulations (limitations of memory, caches and texture units was included) of Larrabee's projected performance. [14] Graphs show how many 1 GHz Larrabee cores are required to maintain 60 frame/s at 1600×1200 resolution in several popular games. Roughly 25 cores are required for Gears of War with no antialiasing, 25 cores for F.E.A.R with 4× antialiasing, and 10 cores for Half-Life 2: Episode Two with 4× antialiasing. Intel claimed that Larrabee would likely run faster than 1 GHz, so these numbers do not represent actual cores, rather virtual timeslices of such. Another graph shows that performance on these games scales nearly linearly with the number of cores up to 32 cores. At 48 cores the performance drops to 90% of what would be expected if the linear relationship continued. [23]

A June 2007 PC Watch article suggested that the first Larrabee chips would feature 32 x86 processor cores and come out in late 2009, fabricated on a 45 nanometer process. Chips with a few defective cores due to yield issues would be sold as a 24-core version. Later in 2010, Larrabee would be shrunk for a 32 nanometer fabrication process to enable a 48-core version. [24]

The last statement of performance can be calculated (theoretically this is maximum possible performance) as follows: 32 cores × 16 single-precision float SIMD/core × 2 FLOP (fused multiply-add) × 2 GHz = 2 TFLOPS theoretically.

Public demonstrations

A public demonstration of the Larrabee ray-tracing capabilities took place at the Intel Developer Forum in San Francisco on September 22, 2009. An experimental version of Enemy Territory: Quake Wars titled Quake Wars: Ray Traced was shown in real-time. The scene contained a ray traced water surface that reflected the surrounding objects, like a ship and several flying vehicles, accurately. [25] [26] [27]

A second demo was given at the SC09 conference in Portland at November 17, 2009 during a keynote by Intel CTO Justin Rattner. A Larrabee card was able to achieve 1006 GFLops in the SGEMM 4Kx4K calculation.

An engineering sample of a Larrabee card was procured and reviewed by Linus Sebastian in a video published May 14, 2018. He was unable to make the card give video output however, with the motherboard displaying POST code D6. [28] In 2022 another card was demonstrated by YouTuber Roman “der8auer” Hartung, which was shown to be working and outputting a display signal but was not capable of 3D acceleration due to missing drivers. [29]

See also

Related Research Articles

<span class="mw-page-title-main">AMD</span> American multinational semiconductor company

Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets.

<span class="mw-page-title-main">Pentium (original)</span> Intel microprocessor

The Pentium is a x86 microprocessor introduced by Intel on March 22, 1993. It is the first CPU using the Pentium brand. Considered the fifth generation in the 8086 compatible line of processors, its implementation and microarchitecture was internally called P5.

<span class="mw-page-title-main">GeForce</span> Brand of GPUs by Nvidia

GeForce is a brand of graphics processing units (GPUs) designed by Nvidia and marketed for the performance market. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, intended for the high-margin PC gaming market, and later diversification of the product line covered all tiers of the PC graphics market, ranging from cost-sensitive GPUs integrated on motherboards, to mainstream add-in retail boards. Most recently, GeForce technology has been introduced into Nvidia's line of embedded application processors, designed for electronic handhelds and mobile handsets.

<span class="mw-page-title-main">Coprocessor</span> Type of computer processor

A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed to accelerate computer graphics and image processing. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

<span class="mw-page-title-main">Northbridge (computing)</span> PC chip handling onboard control tasks

In computing, a northbridge is one of two chips comprising the core logic chipset architecture on motherboards for older personal computers. A northbridge is connected directly to a CPU via the front-side bus (FSB) to handle high-performance tasks, and is usually used in conjunction with a slower southbridge to manage communication between the CPU and other parts of the motherboard.

The Intel Core microarchitecture is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the previous iteration of the P6 microarchitecture series which started in 1995 with Pentium Pro. It also replaced the NetBurst microarchitecture, which suffered from high power consumption and heat intensity due to an inefficient pipeline designed for high clock rate. In early 2004 the new version of NetBurst (Prescott) needed very high power to reach the clocks it needed for competitive performance, making it unsuitable for the shift to dual/multi-core CPUs. On May 7, 2004 Intel confirmed the cancellation of the next NetBurst, Tejas and Jayhawk. Intel had been developing Merom, the 64-bit evolution of the Pentium M, since 2001, and decided to expand it to all market segments, replacing NetBurst in desktop computers and servers. It inherited from Pentium M the choice of a short and efficient pipeline, delivering superior performance despite not reaching the high clocks of NetBurst.

<span class="mw-page-title-main">Sandy Bridge</span> Intel processor microarchitecture

Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation of the Intel Core processors. The Sandy Bridge microarchitecture is the successor to Nehalem and Westmere microarchitecture. Intel demonstrated an A1 stepping Sandy Bridge processor in 2009 during Intel Developer Forum (IDF), and released first products based on the architecture in January 2011 under the Core brand.

<span class="mw-page-title-main">Intel Atom</span> Microprocessor brand name by Intel

Intel Atom is a line of IA-32 and x86-64 instruction set ultra-low-voltage processors by Intel Corporation designed to reduce electric consumption and power dissipation in comparison with ordinary processors of the Intel Core series. Atom is mainly used in netbooks, nettops, embedded applications ranging from health care to advanced robotics, mobile Internet devices (MIDs) and phones. The line was originally designed in 45 nm complementary metal–oxide–semiconductor (CMOS) technology and subsequent models, codenamed Cedar, used a 32 nm process.

<span class="mw-page-title-main">Intel Core</span> Line of CPUs by Intel

Intel Core is a line of multi-core central processing units (CPUs) for midrange, embedded, workstation, high-end and enthusiast computer markets marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time of their introduction, moving the Pentium to the entry level. Identical or more capable versions of Core processors are also sold as Xeon processors for the server and workstation markets.

<span class="mw-page-title-main">GPU switching</span> Mechanism for computers with multiple graphic controllers

GPU switching is a mechanism used on computers with multiple graphic controllers. This mechanism allows the user to either maximize the graphic performance or prolong battery life by switching between the graphic cards. It is mostly used on gaming laptops which usually have an integrated graphic device and a discrete video card.

Bonnell is a CPU microarchitecture used by Intel Atom processors which can execute up to two instructions per cycle. Like many other x86 microprocessors, it translates x86 instructions into simpler internal operations prior to execution. The majority of instructions produce one micro-op when translated, with around 4% of instructions used in typical programs producing multiple micro-ops. The number of instructions that produce more than one micro-op is significantly fewer than the P6 and NetBurst microarchitectures. In the Bonnell microarchitecture, internal micro-ops can contain both a memory load and a memory store in connection with an ALU operation, thus being more similar to the x86 level and more powerful than the micro-ops used in previous designs. This enables relatively good performance with only two integer ALUs, and without any instruction reordering, speculative execution or register renaming. A side effect of having no speculative execution is invulnerability against Meltdown and Spectre.

Project Denver is the codename of a central processing unit designed by Nvidia that implements the ARMv8-A 64/32-bit instruction sets using a combination of simple hardware decoder and software-based binary translation where "Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128 MB cache stored in main memory". Denver is a very wide in-order superscalar pipeline. Its design makes it suitable for integration with other SIPs cores into one die constituting a system on a chip (SoC).

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

<span class="mw-page-title-main">Broadwell (microarchitecture)</span> Fifth generation of Intel Core processors

Broadwell is the fifth generation of the Intel Core processor. It is Intel's codename for the 14 nanometer die shrink of its Haswell microarchitecture. It is a "tick" in Intel's tick–tock principle as the next step in semiconductor fabrication. Like some of the previous tick-tock iterations, Broadwell did not completely replace the full range of CPUs from the previous microarchitecture (Haswell), as there were no low-end desktop CPUs based on Broadwell.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading. It is different from SPMD in that all instructions in all "threads" are executed in lock-step. The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs.

<span class="mw-page-title-main">Zhaoxin</span> Chinese semiconductor chip manufacturer

Zhaoxin is a fabless semiconductor company, created in 2013 as a joint venture between VIA Technologies and the Shanghai Municipal Government. The company manufactures x86-compatible desktop and laptop CPUs. The term Zhàoxīn means million core. The processors are created mainly for the Chinese market: the venture is an attempt to reduce the Chinese dependence on foreign technology.

<span class="mw-page-title-main">Intel Xe</span> Intel GPU architecture

Intel Xe, earlier known unofficially as Gen12, is a GPU architecture developed by Intel.

References

  1. Forsythe, Tom. "SMACNI to AVX512 the life cycle of an instruction set" (PDF).
  2. Forsyth, Tom (2020-12-22). "Tom Forsyth on Naming of Larrabee Instruction Set". Archived from the original on 2020-12-22. Retrieved 2020-12-22.
  3. 1 2 Crothers, Brooke (December 4, 2009). "Intel: Initial Larrabee graphics chip canceled". CNET . CBS Interactive.
  4. Charlie Demerjian (December 4, 2009). "Intel kills consumer Larrabee, focuses on future variants - SemiAccurate". SemiAccurate.com. Retrieved April 9, 2017.
  5. 1 2 Smith, Ryan (May 25, 2010). "Intel Kills Larrabee GPU, Will Not Bring a Discrete Graphics Product to Market". AnandTech .
  6. Smith, Ryan (June 13, 2018). "Intel's First (Modern) Discrete GPU Set For 2020". Anandtech . Retrieved November 4, 2018.
  7. Stokes, Jon (5 December 2009). "Intel's Larrabee GPU put on ice, more news to come in 2010". Ars Technica . Condé Nast.
  8. Smith, Ryan. "Intel Cancels Larrabee Retail Products, Larrabee Project Lives On". AnandTech.com. Retrieved April 9, 2017.
  9. "Blogs@Intel - Intel Blogs". Intel.com. Retrieved April 9, 2017.
  10. Stokes, Jon (17 September 2007). "Intel picks up gaming physics engine for forthcoming GPU product". Ars Technica . Retrieved 2007-09-17.
  11. Stokes, Jon (27 April 2007). "Clearing up the confusion over Intel's Larrabee". Ars Technica . Retrieved June 1, 2007.
  12. "Larrabee performance--beyond the sound bite". CNet.com. Retrieved April 9, 2017.
  13. "Intel's 'Larrabee' on Par With GeForce GTX 285". TomsHardware.com. June 2, 2009. Retrieved April 9, 2017.
  14. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Seiler, L.; Cavin, D.; Espasa, E.; Grochowski, T.; Juan, M.; Hanrahan, P.; Carmean, S.; Sprangle, A.; Forsyth, J.; Abrash, R.; Dubey, R.; Junkins, E.; Lake, T.; Sugerman, P. (August 2008). "Larrabee: A Many-Core x86 Architecture for Visual Computing" (PDF). ACM Transactions on Graphics . Proceedings of ACM SIGGRAPH 2008. 27 (3): 18:11. doi:10.1145/1360612.1360617. ISSN   0730-0301. S2CID   52799248. Archived from the original (PDF) on 2021-03-07. Retrieved 2008-08-06.
  15. "Intel's Larrabee GPU based on secret Pentagon tech, sorta [Updated]". Ars Technica. 9 July 2008. Retrieved 2008-08-06.
  16. Glaskowsky, Peter. "Intel's Larrabee--more and less than meets the eye". CNET . Retrieved 2008-08-20.
  17. Stokes, Jon (5 June 2007). "Clearing up the confusion over Intel's Larrabee, part II". Ars Technica . Retrieved 2008-01-16.
  18. "Intel to use Larrabee graphics on CPUs - SemiAccurate". SemiAccurate.com. August 19, 2009. Retrieved April 9, 2017.
  19. Chris Leyton (August 13, 2008). "Intel's Larrabee Shaping Up For Next-Gen Consoles?". Archived from the original on August 17, 2008. Retrieved August 24, 2008.
  20. Charlie Demerjian (February 5, 2009). "Intel Will Design PlayStation 4 GPU". Archived from the original on May 11, 2009. Retrieved August 28, 2009.{{cite web}}: CS1 maint: unfit URL (link)
  21. Wilson, Anand Lal Shimpi & Derek. "Intel's Larrabee Architecture Disclosure: A Calculated First Move". AnandTech.com. Retrieved April 9, 2017.
  22. Ng, Jansen (May 13, 2009). "Intel Visual Computing Institute Opens, Will Spur "Larrabee" Development". DailyTech. Archived from the original on May 16, 2009. Retrieved May 13, 2009.
  23. Steve Seguin (August 20, 2008). "Intel's 'Larrabee' to Shakeup[sic] AMD, Nvidia". Tom's Hardware. Retrieved August 24, 2008.
  24. "Intel is promoting the 32 core CPU "Larrabee"" (in Japanese). pc.watch.impress.co.jp. Retrieved August 6, 2008. translation
  25. Geeks3D (2008-06-12), Ray Traced Quake Wars, archived from the original on 2021-09-17, retrieved 2022-03-07{{citation}}: CS1 maint: numeric names: authors list (link)
  26. "Light It Up! Quake Wars* Gets Ray Traced" (PDF). Archived (PDF) from the original on February 15, 2010. Retrieved 2022-03-07.
  27. "Quake Wars: Ray Traced". 2008-08-18. Archived from the original on 2011-07-19.
  28. Linus Tech Tips (2018-05-14), WE GOT INTEL'S PROTOTYPE GRAPHICS CARD!!, archived from the original on 2021-12-21, retrieved 2019-05-10
  29. der8auer EN (2022-12-24), HW-Legends #13: Intel Canceled This Project - The most expensive Card in my Collection (Larrabee), archived from the original on 2023-07-23, retrieved 2023-07-23{{citation}}: CS1 maint: numeric names: authors list (link)