Memory hierarchy

Last updated
Diagram of the computer memory hierarchy ComputerMemoryHierarchy.svg
Diagram of the computer memory hierarchy

In computer organisation, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. [1] Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

Contents

Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the size and capabilities of each component. Each of the various components can be viewed as part of a hierarchy of memories (m1, m2, ..., mn) in which each member mi is typically smaller and faster than the next highest member mi+1 of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling for activating the transfer.

There are four major storage levels. [1]

This is a general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for virtual memory when designing a computer architecture, and one can include a level of nearline storage between online and offline storage.

Properties of the technologies in the memory hierarchy

Examples

Memory hierarchy of an AMD Bulldozer server. Hwloc.png
Memory hierarchy of an AMD Bulldozer server.

The number of levels in the memory hierarchy and the performance at each level has increased over time. The type of memory or storage components also change historically. [6] For example, the memory hierarchy of an Intel Haswell Mobile [7] processor circa 2013 is:

The lower levels of the hierarchy – from disks downwards – are also known as tiered storage. The formal distinction between online, nearline, and offline storage is: [12]

For example, always-on spinning disks are online, while spinning disks that spin-down, such as massive array of idle disk (MAID), are nearline. Removable media such as tape cartridges that can be automatically loaded, as in a tape library, are nearline, while cartridges that must be manually loaded are offline.

Most modern CPUs are so fast that for most program workloads, the bottleneck is the locality of reference of memory accesses and the efficiency of the caching and memory transfer between different levels of the hierarchy[ citation needed ]. As a result, the CPU spends much of its time idling, waiting for memory I/O to complete. This is sometimes called the space cost, as a larger memory object is more likely to overflow a small/fast level and require use of a larger/slower level. The resulting load on memory use is known as pressure (respectively register pressure, cache pressure, and (main) memory pressure). Terms for data being missing from a higher level and needing to be fetched from a lower level are, respectively: register spilling (due to register pressure: register to cache), cache miss (cache to main memory), and (hard) page fault (main memory to disk).

Modern programming languages mainly assume two levels of memory, main memory and disk storage, though in assembly language and inline assemblers in languages such as C, registers can be directly accessed. Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from the operating system):

Many programmers assume one level of memory. This works fine until the application hits a performance wall. Then the memory hierarchy will be assessed during code refactoring.

See also

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locality – temporal and spatial locality. Temporal locality refers to the reuse of specific data and/or resources within a relatively small time duration. Spatial locality refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as traversing the elements in a one-dimensional array.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on the usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. It is a part of the chip's memory-management unit (MMU). A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

<span class="mw-page-title-main">Linear Tape-Open</span> Magnetic tape-based data storage technology

Linear Tape-Open (LTO) is a magnetic tape data storage technology originally developed in the late 1990s as an open standards alternative to the proprietary magnetic tape formats that were available at the time. Hewlett Packard Enterprise, IBM, and Quantum control the LTO Consortium, which directs development and manages licensing and certification of media and mechanism manufacturers.

<span class="mw-page-title-main">Microarchitecture</span> Component of computer engineering

In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

Hierarchical storage management (HSM), also known as Tiered storage, is a data storage and Data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid state drive arrays, are more expensive than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

In computing, a hybrid drive is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding some of the speed of SSDs to the cost-effective storage capacity of traditional HDDs. The purpose of the SSD in a hybrid drive is to act as a cache for the data stored on the HDD, improving the overall performance by keeping copies of the most frequently used data on the faster SSD drive.

Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is an internal memory, usually high-speed, used for temporary storage of calculations, data, and other work in progress. In reference to a microprocessor, scratchpad refers to a special high-speed memory used to hold small items of data for rapid retrieval. It is similar to the usage and size of a scratchpad in life: a pad of paper for preliminary notes or sketches or writings, etc. When the scratchpad is a hidden portion of the main memory then it is sometimes referred to as bump storage.

<span class="mw-page-title-main">Solid-state drive</span> Data storage device

A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is also sometimes called a semiconductor storage device, a solid-state device or a solid-state disk, even though SSDs lack the physical spinning disks and movable read–write heads used in hard disk drives (HDDs) and floppy disks. SSD also has rich internal parallelism for data processing.

Universal memory refers to a computer data storage device combining the cost benefits of DRAM, the speed of SRAM, the non-volatility of flash memory along with infinite durability, and longevity. Such a device, if it ever becomes possible to develop, would have a far-reaching impact on the computer market. Some doubt that such a type of memory will ever be possible.

<span class="mw-page-title-main">Random-access memory</span> Form of computer data storage

Random-access memory is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory, in contrast with other direct-access data storage media, where the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement.

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

Virtual memory compression is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the auxiliary storage. In a virtual memory compression system, pages to be paged out of virtual memory are compressed and stored in physical memory, which is usually random-access memory (RAM), or sent as compressed to auxiliary storage such as a hard disk drive (HDD) or solid-state drive (SSD). In both cases the virtual memory range, whose contents has been compressed, is marked inaccessible so that attempts to access compressed pages can trigger page faults and reversal of the process. The footprint of the data being paged is reduced by the compression process; in the first instance, the freed RAM is returned to the available physical memory pool, while the compressed portion is kept in RAM. In the second instance, the compressed data is sent to auxiliary storage but the resulting I/O operation is smaller and therefore takes less time.

<span class="mw-page-title-main">3D XPoint</span> Discontinued computer memory type

3D XPoint is a discontinued non-volatile memory (NVM) technology developed jointly by Intel and Micron Technology. It was announced in July 2015 and was available on the open market under the brand name Optane (Intel) from April 2017 to July 2022. Bit storage is based on a change of bulk resistance, in conjunction with a stackable cross-grid data access array. Initial prices are less than dynamic random-access memory (DRAM) but more than flash memory.

<span class="mw-page-title-main">Cache hierarchy</span> Memory hierarchy concept applied to CPU caches with multiple levels

Cache hierarchy, or multi-level caches, refers to a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access by central processing unit (CPU) cores.

References

  1. 1 2 Toy, Wing; Zee, Benjamin (1986). Computer Hardware/Software Architecture. Prentice Hall. p.  30. ISBN   0-13-163502-6.
  2. Write-combining
  3. "Memory Hierarchy". Unitity Semiconductor Corporation. Archived from the original on 5 August 2009. Retrieved 16 September 2009.
  4. Pádraig Brady. "Multi-Core" . Retrieved 16 September 2009.
  5. 1 2 3 van der Pas, Ruud (2002). "Memory Hierarchy in Cache-Based Systems" (PDF). Santa Clara, California: Sun Microsystems: 26. 817-0742-10.{{cite journal}}: Cite journal requires |journal= (help)
  6. "Memory & Storage - Timeline of Computer History - Computer History Museum". www.computerhistory.org.
  7. Crothers, Brooke. "Dissecting Intel's top graphics in Apple's 15-inch MacBook Pro - CNET". News.cnet.com. Retrieved 2014-07-31.
  8. "Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel". AnandTech. Retrieved 2014-07-31.
  9. 1 2 3 4 5 "SiSoftware Zone". Sisoftware.co.uk. Archived from the original on 2014-09-13. Retrieved 2014-07-31.
  10. "Samsung 960 Pro M.2 NVMe SSD Review". storagereview.com. Retrieved 2017-04-13.
  11. "Ultrium - LTO Technology - Ultrium GenerationsLTO". Lto.org. Archived from the original on 2011-07-27. Retrieved 2014-07-31.
  12. Pearson, Tony (2010). "Correct use of the term Nearline". IBM Developerworks, Inside System Storage. Archived from the original on 2018-11-27. Retrieved 2015-08-16.