Multi-channel memory architecture

Last updated

In the fields of digital electronics and computer hardware, multi-channel memory architecture is a technology that increases the data transfer rate between the DRAM memory and the memory controller by adding more channels of communication between them. Theoretically, this multiplies the data rate by exactly the number of channels present. Dual-channel memory employs two channels. The technique goes back as far as the 1960s having been used in IBM System/360 Model 91 and in CDC 6600. [1]

Contents

Modern high-end desktop and workstation processors such as the AMD Ryzen Threadripper series and the Intel Core i9 Extreme Edition lineup support quad-channel memory. Server processors from the AMD Epyc series and the Intel Xeon platforms give support to memory bandwidth starting from quad-channel module layout to up to octa-channel layout. [2] In March 2010, AMD released Socket G34 and Magny-Cours Opteron 6100 series [3] processors with support for quad-channel memory. In 2006, Intel released chipsets that support quad-channel memory for its LGA771 platform [4] and later in 2011 for its LGA2011 platform. [5] Microcomputer chipsets with even more channels were designed; for example, the chipset in the AlphaStation 600 (1995) supports eight-channel memory, but the backplane of the machine limited operation to four channels. [6]

Dual-channel architecture

Dual-channel memory slots, color-coded orange and yellow for this particular motherboard. Dual channel slots.jpg
Dual-channel memory slots, color-coded orange and yellow for this particular motherboard.

Dual-channel-enabled memory controllers in a PC system architecture use two 64-bit data channels. Dual-channel should not be confused with double data rate (DDR), in which data exchange happens twice per DRAM clock. The two technologies are independent of each other, and many motherboards use both by using DDR memory in a dual-channel configuration.

Operation

Dual-channel architecture requires a dual-channel-capable motherboard and two or more DDR memory modules. The memory modules are installed into matching banks, each of which belongs to a different channel. The motherboard's manual will provide an explanation of how to install memory for that particular unit. A matched pair of memory modules may usually be placed in the first bank of each channel, and a different-capacity pair of modules in the second bank. [7] Modules rated at different speeds can be run in dual-channel mode, although the motherboard will then run all memory modules at the speed of the slowest module. Some motherboards, however, have compatibility issues with certain brands or models of memory when attempting to use them in dual-channel mode. For this reason, it is generally advised to use identical pairs of memory modules, which is why most memory manufacturers now sell "kits" of matched-pair DIMMs. Several motherboard manufacturers only support configurations where a "matched pair" of modules are used. A matching pair needs to match in:

Theoretically any matched pair of memory modules may be used in either single- or dual-channel operation, provided the motherboard supports this architecture.

With the introduction of DDR5, each DDR5 DIMM has two independent sub-channels.

Performance

Theoretically, dual-channel configurations double the memory bandwidth when compared to single-channel configurations. This should not be confused with double data rate (DDR) memory, which doubles the usage of DRAM bus by transferring data both on the rising and falling edges of the memory bus clock signals.

Ganged versus unganged

Dual-channel was originally conceived as a way to maximize memory throughput by combining two 64-bit buses into a single 128-bit bus.[ disputed ][ citation needed ] This is retrospectively called the "ganged" mode. However, due to lackluster performance gains in consumer applications, [8] more modern implementations of dual-channel use the "unganged" mode by default, which maintains two 64-bit memory buses but allows independent access to each channel, in support of multithreading with multi-core processors. [9] [10]

"Ganged" versus "unganged" difference could also be envisioned as an analogy with the way RAID 0 works, when compared to JBOD. [11] With RAID 0 (which is analogous to "ganged" mode), it is up to the additional logic layer to provide better (ideally even) usage of all available hardware units (storage devices, or memory modules) and increased overall performance. On the other hand, with JBOD (which is analogous to "unganged" mode) it is relied on the statistical usage patterns to ensure increased overall performance through even usage of all available hardware units. [9] [10]

Triple-channel architecture

Operation

DDR3 triple-channel architecture is used in the Intel Core i7-900 series (the Intel Core i7-800 series only support up to dual-channel). The LGA 1366 platform (e.g. Intel X58) supports DDR3 triple-channel, normally 1333 and 1600Mhz, but can run at higher clock speeds on certain motherboards. AMD Socket AM3 processors do not use the DDR3 triple-channel architecture but instead use dual-channel DDR3 memory. The same applies to the Intel Core i3, Core i5 and Core i7-800 series, which are used on the LGA 1156 platforms (e.g., Intel P55). According to Intel, a Core i7 with DDR3 operating at 1066 MHz will offer peak data transfer rates of 25.6 GB/s when operating in triple-channel interleaved mode. This, Intel claims, leads to faster system performance as well as higher performance per watt. [12]

When operating in triple-channel mode, memory latency is reduced due to interleaving, meaning that each module is accessed sequentially for smaller bits of data rather than completely filling up one module before accessing the next one. Data is spread amongst the modules in an alternating pattern, potentially tripling available memory bandwidth for the same amount of data, as opposed to storing it all on one module.

The architecture can only be used when all three, or a multiple of three, memory modules are identical in capacity and speed, and are placed in three-channel slots. When two memory modules are installed, the architecture will operate in dual-channel architecture mode. [13]

Supporting processors

Quad-channel architecture

Operation

Quad-channel memory debuted on Intel's Nehalem-EX LGA 1567 platform of Xeon CPUs, aka Beckton in 2010, and was introduced to the high end product line on the Intel X79 LGA 2011 platform with Sandy Bridge-E in late 2011. DDR4 replaced DDR3 on the Intel X99 LGA 2011 platform, aka Haswell-E, and is also used in AMD's Threadripper platform. [16] DDR3 quad-channel architecture is used in the AMD G34 platform and in the aforementioned Intel CPUs prior to Haswell. AMD processors for the C32 platform and Intel processors for the LGA 1155 platform (e.g. Intel Z68) use dual-channel DDR3 memory instead.

The architecture can be used only when all four memory modules (or a multiple of four) are identical in capacity and speed, and are placed in quad-channel slots. When two memory modules are installed, the architecture will operate in a dual-channel mode; When three memory modules are installed, the architecture will operate in a triple-channel mode. [13]

Performance

A benchmark performed by TweakTown, using SiSoftware Sandra, measured around 70% increase in performance of a quadruple-channel configuration, when compared to a dual-channel configuration. [17] :p. 5 Other tests performed by TweakTown on the same subject showed no significant differences in performance, leading to a conclusion that not all benchmark software is up to the task of exploiting increased parallelism offered by the multi-channel memory configurations. [17] :p. 6

Supporting processors

Hexa-channel architecture

Supported by Qualcomm Centriq server processors, [20] and processors from the Intel Xeon Scalable platform. [21]

Octa-channel architecture

Octa-channel setup with an AMD Ryzen Threadripper Pro 7995WX HP Z6 (Ji Ke Wan Geekerwan) 029.png
Octa-channel setup with an AMD Ryzen Threadripper Pro 7995WX

Supported by Cavium ThunderX2 server processors, AMD's server processors from their Epyc platform, and the Threadripper PRO lineup of professional-class workstation processors. [22] [23] [24]

Dodeca-channel architecture

Dodeca-channel or 12-channel memory architecture is introduced with AMD's server processors from their Epyc 9004 platform released in 2022, using DDR5 memory. [25]

See also

Related Research Articles

<span class="mw-page-title-main">Opteron</span> Server and workstation processor line by AMD

Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture. It was released on April 22, 2003, with the SledgeHammer core (K8) and was intended to compete in the server and workstation markets, particularly in the same segment as the Intel Xeon processor. Processors based on the AMD K10 microarchitecture were announced on September 10, 2007, featuring a new quad-core configuration. The last released Opteron CPUs are the Piledriver-based Opteron 4300 and 6300 series processors, codenamed "Seoul" and "Abu Dhabi" respectively.

<span class="mw-page-title-main">Xeon</span> Line of Intel server and workstation processors

Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for error correction code (ECC) memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture (MCA). They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Ultra Path Interconnect (UPI) bus, which replaced the older QuickPath Interconnect (QPI) bus.

<span class="mw-page-title-main">Land grid array</span> Type of surface-mount packaging for integrated circuits

The land grid array (LGA) is a type of surface-mount packaging for integrated circuits (ICs) that is notable for having the pins on the socket — as opposed to pins on the integrated circuit, known as a pin grid array (PGA). An LGA can be electrically connected to a printed circuit board (PCB) either by the use of a socket or by soldering directly to the board.

<span class="mw-page-title-main">Altix</span> Supercomputer family

Altix is a line of server computers and supercomputers produced by Silicon Graphics, based on Intel processors. It succeeded the MIPS/IRIX-based Origin 3000 servers.

The Intel Core microarchitecture is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the previous iteration of the P6 microarchitecture series which started in 1995 with Pentium Pro. It also replaced the NetBurst microarchitecture, which suffered from high power consumption and heat intensity due to an inefficient pipeline designed for high clock rate. In early 2004 the new version of NetBurst (Prescott) needed very high power to reach the clocks it needed for competitive performance, making it unsuitable for the shift to dual/multi-core CPUs. On May 7, 2004 Intel confirmed the cancellation of the next NetBurst, Tejas and Jayhawk. Intel had been developing Merom, the 64-bit evolution of the Pentium M, since 2001, and decided to expand it to all market segments, replacing NetBurst in desktop computers and servers. It inherited from Pentium M the choice of a short and efficient pipeline, delivering superior performance despite not reaching the high clocks of NetBurst.

Double Data Rate 3 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth interface, and has been in use since 2007. It is the higher-speed successor to DDR and DDR2 and predecessor to DDR4 synchronous dynamic random-access memory (SDRAM) chips. DDR3 SDRAM is neither forward nor backward compatible with any earlier type of random-access memory (RAM) because of different signaling voltages, timings, and other factors.

<span class="mw-page-title-main">Multi-chip module</span> Electronic assembly containing multiple integrated circuits that behaves as a unit

A multi-chip module (MCM) is generically an electronic assembly where multiple integrated circuits, semiconductor dies and/or other discrete components are integrated, usually onto a unifying substrate, so that in use it can be treated as if it were a larger IC. Other terms for MCM packaging include "heterogeneous integration" or "hybrid integrated circuit". The advantage of using MCM packaging is it allows a manufacturer to use multiple components for modularity and/or to improve yields over a conventional monolithic IC approach.

<span class="mw-page-title-main">Fully Buffered DIMM</span>

A Fully Buffered DIMM (FB-DIMM) is a type of memory module used in computer systems. It is designed to improve memory performance and capacity by allowing multiple memory modules to be each connected to the memory controller using a serial interface, rather than a parallel one. Unlike the parallel bus architecture of traditional DRAMs, an FB-DIMM has a serial interface between the memory controller and the advanced memory buffer (AMB). Conventionally, data lines from the memory controller have to be connected to data lines in every DRAM module, i.e. via multidrop buses. As the memory width increases together with the access speed, the signal degrades at the interface between the bus and the device. This limits the speed and memory density, so FB-DIMMs take a different approach to solve the problem.

A memory controller is a digital circuit that manages the flow of data going to and from a computer's main memory. A memory controller can be a separate chip or integrated into another chip, such as being placed on the same die or as an integral part of a microprocessor; in the latter case, it is usually called an integrated memory controller (IMC). A memory controller is sometimes also called a memory chip controller (MCC) or a memory controller unit (MCU).

The AMD Quad FX platform is an AMD platform targeted at enthusiasts which allows users to plug two Socket F Athlon 64 FX or 2-way Opteron processors (CPUs) into a single motherboard for a total of four physical cores. This is a type of dual processor setup, where two CPUs are installed on a motherboard to increase computing power. The major difference between the platform and past dual processor systems like Xeon is that each processor has its own dedicated memory stores. The Quad FX platform also has HyperTransport capability targeted toward consumer platforms.

<span class="mw-page-title-main">LGA 1156</span> Intel desktop CPU socket

LGA 1156, also known as Socket H or H1, is an Intel desktop CPU socket. Its incompatible successor is LGA 1155.

Bloomfield is the code name for Intel high-end desktop processors sold as Core i7-9xx and single-processor servers sold as Xeon 35xx., in almost identical configurations, replacing the earlier Yorkfield processors. The Bloomfield core is closely related to the dual-processor Gainestown, which has the same CPUID value of 0106Ax and which uses the same socket. Bloomfield uses a different socket than the later Lynnfield and Clarksfield processors based on the same 45 nm Nehalem microarchitecture, even though some of these share the same Intel Core i7 brand.

<span class="mw-page-title-main">Skylake (microarchitecture)</span> CPU microarchitecture by Intel

Skylake is Intel's codename for its sixth generation Core microprocessor family that was launched on August 5, 2015, succeeding the Broadwell microarchitecture. Skylake is a microarchitecture redesign using the same 14 nm manufacturing process technology as its predecessor, serving as a tock in Intel's tick–tock manufacturing and design model. According to Intel, the redesign brings greater CPU and GPU performance and reduced power consumption. Skylake CPUs share their microarchitecture with Kaby Lake, Coffee Lake, Whiskey Lake, and Comet Lake CPUs.

<span class="mw-page-title-main">Broadwell (microarchitecture)</span> Fifth generation of Intel Core processors

Broadwell is the fifth generation of the Intel Core processor. It is Intel's codename for the 14 nanometer die shrink of its Haswell microarchitecture. It is a "tick" in Intel's tick–tock principle as the next step in semiconductor fabrication. Like some of the previous tick-tock iterations, Broadwell did not completely replace the full range of CPUs from the previous microarchitecture (Haswell), as there were no low-end desktop CPUs based on Broadwell.

<span class="mw-page-title-main">Socket G2</span> CPU socket for Intel mobile processors

Socket G2, also known as rPGA 988B is Intel's CPU socket used with their line of mobile Core i7, the successor to the Core 2 line, and also with several mobile Core i5 and Core i3 processors. It is based on Intel's Sandy Bridge and Ivy Bridge architecture. Like its predecessor, socket G1 systems, it can only run in dual-channel memory mode, but with data rates up to 1600 MHz. Socket G2 CPUs are also known as FCPGA988 socket processors, which should be pin compatible with PPGA988.

Intel Ivy Bridge–based Xeon microprocessors is the follow-up to Sandy Bridge-E, using the same CPU core as the Ivy Bridge processor, but in LGA 2011, LGA 1356 and LGA 2011-1 packages for workstations and servers.

<span class="mw-page-title-main">LGA 1151</span> Intel microprocessor compatible socket

LGA 1151, also known as Socket H4, is a type of zero insertion force flip-chip land grid array (LGA) socket for Intel desktop processors which comes in two distinct versions: the first revision which supports both Intel's Skylake and Kaby Lake CPUs, and the second revision which supports Coffee Lake CPUs exclusively.

<span class="mw-page-title-main">Epyc</span> AMD brand for server microprocessors

Epyc is a brand of multi-core x86-64 microprocessors designed and sold by AMD, based on the company's Zen microarchitecture. Introduced in June 2017, they are specifically targeted for the server and embedded system markets.

References

  1. Jacob, Bruce; Ng, Spencer; Wang, David (2007). Memory systems: cache, DRAM, disk. Morgan Kaufmann. p. 318. ISBN   978-0-12-379751-3.
  2. jpringle (September 12, 2017). "Benchmarking Epyc, Ryzen, and Xeon: Tyranny of Memory". Regional Ocean Modeling System . Retrieved April 23, 2021.
  3. 1 2 "Opteron 6000 Series Platform Quick Reference Guide" (PDF). AMD. Archived from the original (PDF) on 2012-05-12. Retrieved 2012-10-15.
  4. 5000P memory controller, Intel.
  5. Intel LGA2011 socket x68 express chipset pictured, Tech power up.
  6. John H. Zurawski; John E. Murray; Paul J. Lemmon, "The Design and Verification of the AlphaStation 600 5-series Workstation", HP, 7 (1), archived from the original on 2021-02-25, retrieved 2011-10-19.
  7. "Intel Dual-Channel DDR Memory Architecture White Paper" (PDF) (Rev. 1.0 ed.). Infineon Technologies North America and Kingston Technology. September 2003. Archived from the original (PDF, 1021  KB) on 2011-09-29. Retrieved 2007-09-06.
  8. "AMD Phenom X4 Memory Controller in the Ganged/ Unganged Mode". ixbtlabs.com. 2008-08-16. Retrieved 2014-01-09.
  9. 1 2 Gionatan Danti (2010-06-17). "The Phenom / PhenomII memory controller: ganged vs unganged mode benchmarked". ilsistemista.net. Retrieved 2014-01-09.
  10. 1 2 "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors" (PDF). amd.com. 2013-01-11. pp. 107–108. Retrieved 2014-01-09. When the DCTs are in ganged mode, as specified by [The DRAM Controller Select Low Register] F2x110 [DctGangEn], then each logical DIMM is two channels wide. Each physical DIMM of a 2-channel logical DIMM is required to be the same size and use the same timing parameters. Both DCTs must be programmed with the same information (see 2.8.1 [DCT Configuration Registers]). When the DCTs are in unganged mode, a logical DIMM is equivalent to a 64-bit physical DIMM and each channel is controlled by a different DCT. Typical systems are recommended to run in unganged mode to benefit from the additional parallelism generated by using the DCTs independently. See 2.12.2 [DRAM Considerations for ECC] for DRAM ECC implications of ganged and unganged mode. Ganged mode is not supported for S1g3, S1g4, ASB2, and G34 processors.
  11. Rouse, Margaret (September 2005). "JBOD (just a bunch of disks or just a bunch of drives)". SearchStorage.TechTarget.com. Retrieved 2014-01-09.
  12. X58 Product Brief (PDF), Intel
  13. 1 2 Desktop Boards – Triple Memory Modules, Intel, archived from the original on 2009-03-08, retrieved 2011-10-01, Single- and Multichannel Memory Modes
  14. "Core i7 Family Product Comparison". Intel. Memory Specifications: # of Memory Channels.
  15. "Xeon Family Product Comparison". Intel. Memory Specifications: # of Memory Channels.
  16. AMD Ryzen Threadripper And Vega Attack Prey At 4K, Quad GPUs Shred Blender, Radeon RX Vega Hits In July, "....with 16 cores and 32 threads with support for quad-channel DDR4 memory......"
  17. 1 2 Shawn Baker (2011-11-16). "Intel X79 Quad Channel and Z68 Dual Channel Memory Performance Analysis". TweakTown . Retrieved 2013-11-30.
  18. "AMD Opteron 6300 Series processor Quick Reference Guide" (PDF). Retrieved 2013-12-11.
  19. "AMD Opteron 6200 Series Processor Quick Reference Guide" (PDF). Retrieved 2012-10-15.
  20. Kennedy, Patrick (23 August 2017). "Qualcomm Centriq 2400 ARM CPU from Hot Chips 29". Serve The Home. Retrieved 14 November 2017.
  21. "Intel® Xeon® Bronze 3106 Processor (11M Cache, 1.70 GHz)". www.intel.in.
  22. Cutress, Ian (7 March 2017). "AMD Prepares 32-Core Naples CPUs for 1P and 2P Servers: Coming in Q2". Anandtech. Retrieved 7 March 2017.
  23. Kennedy, Patrick (9 November 2017). "Cavium ThunderX2 and OCP Platform Details". Serve the Home. Retrieved 14 November 2017.
  24. Cutress, Ian (July 14, 2021). "AMD Threadripper Pro Review: An Upgrade Over Regular Threadripper?". AnandTech . Retrieved August 18, 2021.
  25. Goetting, Chris (2022-11-10). "AMD 4th Gen EPYC 9004 Series Launched: Genoa Tested In A Data Center Benchmark Gauntlet". HotHardware. Retrieved 2023-12-07.