Lion Cove

Last updated

Lion Cove is a 64-bit, two-way, x86 CPU core architecture designed by Intel. The Lion Cove core is featured in Core Ultra Series 2 Arrow Lake and Lunar Lake processors.

Contents

Architecture

Lion Cove is a performance core architecture aimed at providing high compute performance with wider integer and vector execution units, wider fetch and increased core frequencies compared to the Intel's density-optimized E-core architectures. Intel claims a 14% IPC increase with the Lion Cove P-core over Redwood Cove. Intel approached the Lion Cove design process with the intention to "remove any transistor from the design that doesn't directly contribute to productivity", stripping down the core design in order to focus on single-threading and core area efficiency. [1] Ori Lempel served as Senior Principal Engineer for the Lion Cove- P-core design. [2]

Front end

The front-end of the Lion Cove core for fetching, decoding and issuing instructions has been made wider and deeper. [3] There is 8-way decoding of instructions from the Instruction Queue, up from 6-way decode in Redwood Cove. Likewise, Lion Cove's the Out-of-Order Engine uses an 8-way allocation/rename queue, increased from Redwood Cove's 6-way queue. [4] The Out-of-Order Engine has split the renamers and scheduling into dedicated integer and vector domains which allows Intel to modify each of these domains independently in future designs without requiring a complete redesign of the Out-of-Order Engine. [5] Both of these domains have their own individual access to the micro-op queue. [6] The larger Ops cache size and longer queue benefit efficiency as more micro-ops being stored in the larger cache does not require the decode logic to be powered up again. [5]

Redwood CoveLion Cove
Decode6-way8-way
Allocation/Rename6-way8-way
Retirement8-wide12-wide
Deep instruction window512576
Execution Ports1218
Op Cache4096 entry
8-way
5250 entry
12-way
Op Queue144 entry192 entry

Branch Predictor

Branch prediction has been strengthened in Lion Cove with the core's prediction block being 8 times wider than Redwood Cove. [5] The branch predictor in a core tries to predict the outcome when there are diverging code paths or branch. Lion Cove's L0 Branch Target Buffer (BTB) cache has been doubled to 256 entries to store a higher number of target addresses for a taken branch which can be used to help predict the next branch and reduce the number of misses.

Buffer caches entries
Redwood CoveLion Cove
L0 BTB128256
L1 BTB5K6K
L2 BTB12K12K

Execution Engine

Integer Unit

Lion Cove increases the number of integer Arithmetic Logic Units (ALUs) to six. Redwood Cove contained five ALUs that used a 256-bit wide pipe. [4] The number of integer multiply units has risen from 1 to 3 which means that the core can enact more than 1 integer multiply operations per cycle. [7]

Vector Engine

Intel's vector engine design in Lion Cove now more closely resembles that used by AMD since Zen with four pipes for floating point and vector execution. Two of those pipes deal with floating point multiplies and multiply-adds, while the two other pipes handle floating point adds. [8] The number of floating point dividers has increased from 1 to 2 with improved throughput. [2] For handling sort-vector instructions, the vector engine contains 4 SIMD ALUs, up from 3 in Redwood Cove. [8]

Lion Cove supports AVX-512 instructions but it is disabled in heterogenous processor generations like Arrow Lake and Lunar Lake. This is no different to Golden Cove, Raptor Cove or Redwood Cove that had their AVX-512 support disabled in all heterogenous non-server products.

Cache

Lion Cove introduces an expanded cache hierarchy with four caching tiers rather than three. With select Broadwell SKUs in 2015, Intel added a 128 MB eDRAM that acted like fourth level cache. However, this eDRAM was not a traditional cache as it was placed on a separate die as a form of slower shared memory between the CPU cores and graphics with its intended purpose being to reduce memory access requests. [9] Broadwell's L3 cache had three times lower per-cycle latency and over triple the bandwidth compared to its eDRAM. [9] In terms of adding a new level of traditional cache, the last time Intel did so was in 2003 with L3 cache on the Pentium 4 Extreme Edition. [10]

CacheRedwood CoveLion Cove
L0DSize48 KB48 KB
Associativity12-way12-way
Latency5-cycles4-cycles
Bandwidth___B/clk128B/clk
L0ISize64 KB64 KB
Associativity_-way16-way
Latency-cycles-cycles
Bandwidth32B/clk128B/clk
L1Size192 KB
Associativity12-way
Latency9-cycles
Bandwidth2×64B/clk
L2Size2 MB2.5–3 MB
Associativity16-way10-way
Latency16-cycles17-cycles
Bandwidth__B/clk2×64B/clk
L3Size4 MB4 MB
Associativity12-way12-way
Latency75-cycles51-cycles
Bandwidth32B/clk Read
32B/clk Write
32B/clk Read
32B/clk Write

L0

Lion Cove's L0 caches are what were formerly known as L1 data and instruction caches in any other CPU core architecture. Even though Intel maintains the larger L0 cache sizes in recent core architectures, they have managed to reduce the load-to-use latency down to 4-cycles, not seen since Skylake, rather than 5-cycles in Redwood Cove. [7]

L1

The new 192 KB L1 cache in the Lion Cove core acts as a mid-level buffer cache between the L0 data and instruction caches inside the core and the L2 cache outside the core. It is focussed on reducing latency in the event of L0 data cache misses rather than needing to access the L2 cache. Accessing data in the L1 cache comes with a 9-cycle latency which is nearly half the latency that comes with accessing the L2 cache. [11]

L2

L2 cache is important for the Lion Cove core architecture as Intel's reliance on L2 cache is to insulate the cores from the L3 cache's slow performance. [8] Lion Cove was designed to accommodate L2 caches configurable from 2.5 MB up to 3 MB depending on the product. Lunar Lake's Lion Cove implementation contains a 2.5 MB L2 cache while the Lion Cove variant in Arrow Lake contains contains a 3 MB L2 cache. Lion Cove's larger L2 cache continues the trend of Intel increasing the size of the L2 cache for the last few generations of their P-cores such as Golden Cove, Raptor Cove and Redwood Cove. The previous generation Redwood Cove P-core architecture featured 2 MB of L2 cache. However, increasing the cache size often brings higher latency. Lion Cove's L2 cache has a 17-cycle latency, up from Redwood Cove's 16-cycle latency. [11] [12] Theoretically, the L2 cache can deliver a bandwidth of 110 bytes per cycle but this was limited to 64 bytes per cycle in Lunar Lake for power savings. [7]

L3

The read bandwidth when a single Lion Cove core accesses the L3 cache has regressed from 16 bytes per cycle with Redwood Cove to 10 bytes per cycle for Lion Cove. Despite this lower bandwidth in reading and writing data, the latency of accessing L3 data has been reduced from 75-cycles to 51-cycles. [8]

Related Research Articles

<span class="mw-page-title-main">Xeon</span> Line of Intel server and workstation processors

Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for error correction code (ECC) memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture (MCA). They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Ultra Path Interconnect (UPI) bus, which replaced the older QuickPath Interconnect (QPI) bus.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

The Intel Core microarchitecture is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the previous iteration of the P6 microarchitecture series which started in 1995 with Pentium Pro. It also replaced the NetBurst microarchitecture, which suffered from high power consumption and heat intensity due to an inefficient pipeline designed for high clock rate. In early 2004 the new version of NetBurst (Prescott) needed very high power to reach the clocks it needed for competitive performance, making it unsuitable for the shift to dual/multi-core CPUs. On May 7, 2004 Intel confirmed the cancellation of the next NetBurst, Tejas and Jayhawk. Intel had been developing Merom, the 64-bit evolution of the Pentium M, since 2001, and decided to expand it to all market segments, replacing NetBurst in desktop computers and servers. It inherited from Pentium M the choice of a short and efficient pipeline, delivering superior performance despite not reaching the high clocks of NetBurst.

The AMD Family 10h, or K10, is a microprocessor microarchitecture by AMD based on the K8 microarchitecture. The first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November 11, 2007 as the immediate successors to the K8 series of processors.

The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets. Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011, as the successor to the K10 microarchitecture.

<span class="mw-page-title-main">Haswell (microarchitecture)</span> Intel processor microarchitecture

Haswell is the codename for a processor microarchitecture developed by Intel as the "fourth-generation core" successor to the Ivy Bridge. Intel officially announced CPUs based on this microarchitecture on June 4, 2013, at Computex Taipei 2013, while a working Haswell chip was demonstrated at the 2011 Intel Developer Forum. Haswell was the last generation of Intel processor to have socketed processors on mobile. With Haswell, which uses a 22 nm process, Intel also introduced low-power processors designed for convertible or "hybrid" ultrabooks, designated by the "U" suffix. Haswell began shipping to manufacturers and OEMs in mid-2013, with its desktop chips officially launched in September 2013.

In Intel's Tick-Tock cycle, the 2007/2008 "Tick" was the shrink of the Core microarchitecture to 45 nanometers as CPUID model 23. In Core 2 processors, it is used with the code names Penryn, Wolfdale and Yorkfield, some of which are also sold as Celeron, Pentium and Xeon processors. In the Xeon brand, the Wolfdale-DP and Harpertown code names are used for LGA 771 based MCMs with two or four active Wolfdale cores.

The zEC12 microprocessor is a chip made by IBM for their zEnterprise EC12 and zEnterprise BC12 mainframe computers, announced on August 28, 2012. It is manufactured at the East Fishkill, New York fabrication plant. The processor began shipping in the fall of 2012. IBM stated that it was the world's fastest microprocessor and is about 25% faster than its predecessor the z196.

<span class="mw-page-title-main">Zen (first generation)</span> 2017 AMD 14-nanometer processor microarchitecture

Zen is the first iteration in the Zen family of computer processor microarchitectures from AMD. It was first used with their Ryzen series of CPUs in February 2017. The first Zen-based preview system was demonstrated at E3 2016, and first substantially detailed at an event hosted a block away from the Intel Developer Forum 2016. The first Zen-based CPUs, codenamed "Summit Ridge", reached the market in early March 2017, Zen-derived Epyc server processors launched in June 2017 and Zen-based APUs arrived in November 2017.

<span class="mw-page-title-main">Cache hierarchy</span> Memory hierarchy concept applied to CPU caches with multiple levels

Cache hierarchy, or multi-level cache, is a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access by central processing unit (CPU) cores.

<span class="mw-page-title-main">Zen 5</span> 2024 AMD 4-nanometer processor microarchitecture

Zen 5 is the name for a CPU microarchitecture by AMD, shown on their roadmap in May 2022, launched for mobile in July 2024 and for desktop in August 2024. It is the successor to Zen 4 and is currently fabricated on TSMC's N4X process. Zen 5 is also planned to be fabricated on the N3E process in the future.

Sunny Cove is a codename for a CPU microarchitecture developed by Intel, first released in September 2019. It succeeds the Palm Cove microarchitecture and is fabricated using Intel's 10 nm process node. The microarchitecture is implemented in 10th-generation Intel Core processors for mobile and third generation Xeon scalable server processors. 10th-generation Intel Core mobile processors were released in September 2019, while the Xeon server processors were released on April 6, 2021.

<span class="mw-page-title-main">Zen 3</span> 2020 AMD 7-nanometer processor microarchitecture

Zen 3 is the name for a CPU microarchitecture by AMD, released on November 5, 2020. It is the successor to Zen 2 and uses TSMC's 7 nm process for the chiplets and GlobalFoundries's 14 nm process for the I/O die on the server chips and 12 nm for desktop chips. Zen 3 powers Ryzen 5000 mainstream desktop processors and Epyc server processors. Zen 3 is supported on motherboards with 500 series chipsets; 400 series boards also saw support on select B450 / X470 motherboards with certain BIOSes. Zen 3 is the last microarchitecture before AMD switched to DDR5 memory and new sockets, which are AM5 for the desktop "Ryzen" chips alongside SP5 and SP6 for the EPYC server platform and sTRX8. According to AMD, Zen 3 has a 19% higher instructions per cycle (IPC) on average than Zen 2.

Tremont is a microarchitecture for low-power Atom, Celeron and Pentium Silver branded processors used in systems on a chip (SoCs) made by Intel. It is the successor to Goldmont Plus. Intel officially launched Elkhart Lake platform with 10 nm Tremont core on September 23, 2020. Intel officially launched Jasper Lake platform with 10 nm Tremont core on January 11, 2021.

Willow Cove is a codename for a CPU microarchitecture developed by Intel and released in September 2020. Willow Cove is the successor to the Sunny Cove microarchitecture, and is fabricated using Intel's enhanced 10 nm process node called 10 nm SuperFin (10SF). The microarchitecture powers 11th-generation Intel Core mobile processors.

<span class="mw-page-title-main">Golden Cove</span> CPU microarchitecture by Intel

Golden Cove is a codename for a CPU microarchitecture developed by Intel and released in November 2021. It succeeds four microarchitectures: Sunny Cove, Skylake, Willow Cove, and Cypress Cove. It is fabricated using Intel's Intel 7 process node, previously referred to as 10 nm Enhanced SuperFin (10ESF).

<span class="mw-page-title-main">Arrow Lake (microprocessor)</span> 2024 Intel product line

Arrow Lake is the codename for Core Ultra Series 2 processors designed by Intel, released on October 24, 2024. It follows on from Meteor Lake which saw Intel move from monolithic silicon to a disaggregated MCM design. Meteor Lake was limited to a mobile release while Arrow Lake includes both socketable deskop processors and mainstream and enthusiast mobile processors. Core Ultra 200H and 200HX series mobile processors will follow in early 2025.

Granite Rapids is the codename for 6th generation Xeon Scalable server processors designed by Intel, launched on 24 September 2024. Featuring up to 128 P-cores, Granite Rapids is designed for high performance computing applications. The platform equivalent Sierra Forest processors with up to 288 E-cores launched in June 2024 before Granite Rapids.

Lunar Lake is the codename for Core Ultra 200V Series mobile processors designed by Intel, released in September 2024. It is a successor to Meteor Lake which saw Intel move from monolithic silicon to a disaggregated MCM design.

References

  1. Campbell, Mark (June 4, 2024). "Why are Intel ditching Hyperthreading with Lion Cove and Lunar Lake?". OC3D. Retrieved December 2, 2024.
  2. 1 2 "Next Gen P-core: The Lion Cove Architecture" (PDF). Intel. June 3, 2024. Retrieved December 2, 2024.
  3. Mujtaba, Hassan (June 3, 2024). "Intel Lunar Lake CPU Architecture Deep-Dive: Lion Cove +14% IPC, Skymont IPC More Than Raptor Cove, Next-Gen Power Managment & Scheduling". Wccftech. Retrieved December 2, 2024.
  4. 1 2 Lam, Chester (September 22, 2024). "Intel's Redwood Cove: Baby Steps are Still Steps". Chips and Cheese. Retrieved December 2, 2024.
  5. 1 2 3 Killian, Zak (June 3, 2024). "Intel Lunar Lake CPU Deep Dive: Chipzilla's Mobile Moonshot". HotHardware. Retrieved December 2, 2024.
  6. "Intel Core Ultra Arrow Lake Preview". TechpowerUp. October 10, 2024. Retrieved December 2, 2024.
  7. 1 2 3 Cozma, George (June 4, 2024). "Intel's Lion Cove Architecture Preview". Chips and Cheese. Retrieved December 2, 2024.
  8. 1 2 3 4 Lam, Chester (September 27, 2024). "Lion Cove: Intel's P-Core Roars". Chips and Cheese. Retrieved December 2, 2024.
  9. 1 2 Cutress, Ian (November 2, 2020). "A Broadwell Retrospective Review in 2020: Is eDRAM Still Worth It?". AnandTech. Retrieved December 2, 2024.
  10. Shimpi, Anand Lal (September 16, 2003). "Intel Developer Forum Fall 2003 - Day 1: Introducing Pentium 4 Extreme Edition". AnandTech. Retrieved December 2, 2024.
  11. 1 2 Bonshor, Gavin (June 3, 2024). "Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance". AnandTech. Retrieved December 2, 2024.
  12. Lam, Chester (January 11, 2024). "Previewing Meteor Lake at CES". Chips and Cheese. Retrieved December 2, 2024.

See also