Cache (computing)

Last updated

Diagram of a CPU memory cache operation Cache,basic.svg
Diagram of a CPU memory cache operation

In computing, a cache ( /kæʃ/ KASH) [1] is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs. [2]

Contents

To be cost-effective, caches must be relatively small. Nevertheless, caches are effective in many areas of computing because typical computer applications access data with a high degree of locality of reference. Such access patterns exhibit temporal locality, where data is requested that has been recently requested, and spatial locality, where data is requested that is stored near data that has already been requested.

Motivation

In memory design, there is an inherent trade-off between capacity and speed because larger capacity implies larger size and thus greater physical distances for signals to travel causing propagation delays. There is also a tradeoff between high-performance technologies such as SRAM and cheaper, easily mass-produced commodities such as DRAM, flash, or hard disks.

The buffering provided by a cache benefits one or both of latency and throughput (bandwidth).

A larger resource incurs a significant latency for access – e.g. it can take hundreds of clock cycles for a modern 4 GHz processor to reach DRAM. This is mitigated by reading large chunks into the cache, in the hope that subsequent reads will be from nearby locations and can be read from the cache. Prediction or explicit prefetching can be used to guess where future reads will come from and make requests ahead of time; if done optimally, the latency is bypassed altogether.

The use of a cache also allows for higher throughput from the underlying resource, by assembling multiple fine-grain transfers into larger, more efficient requests. In the case of DRAM circuits, the additional throughput may be gained by using a wider data bus.

Operation

Hardware implements cache as a block of memory for temporary storage of data likely to be used again. Central processing units (CPUs), solid-state drives (SSDs) and hard disk drives (HDDs) frequently include hardware-based cache, while web browsers and web servers commonly rely on software caching.

A cache is made up of a pool of entries. Each entry has associated data, which is a copy of the same data in some backing store. Each entry also has a tag, which specifies the identity of the data in the backing store of which the entry is a copy.

When the cache client (a CPU, web browser, operating system) needs to access data presumed to exist in the backing store, it first checks the cache. If an entry can be found with a tag matching that of the desired data, the data in the entry is used instead. This situation is known as a cache hit. For example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular URL. In this example, the URL is the tag, and the content of the web page is the data. The percentage of accesses that result in cache hits is known as the hit rate or hit ratio of the cache.

The alternative situation, when the cache is checked and found not to contain any entry with the desired tag, is known as a cache miss. This requires a more expensive access of data from the backing store. Once the requested data is retrieved, it is typically copied into the cache, ready for the next access.

During a cache miss, some other previously existing cache entry is typically removed in order to make room for the newly retrieved data. The heuristic used to select the entry to replace is known as the replacement policy. One popular replacement policy, least recently used (LRU), replaces the oldest entry, the entry that was accessed less recently than any other entry. More sophisticated caching algorithms also take into account the frequency of use of entries.

Write policies

A write-through cache without write allocation Write-through with no-write-allocation.svg
A write-through cache without write allocation
A write-back cache with write allocation Write-back with write-allocation.svg
A write-back cache with write allocation

Cache writes must eventually be propagated to the backing store. The timing for this is governed by the write policy. The two primary write policies are: [3]

A write-back cache is more complex to implement since it needs to track which of its locations have been written over and mark them as dirty for later writing to the backing store. The data in these locations are written back to the backing store only when they are evicted from the cache, a process referred to as a lazy write. For this reason, a read miss in a write-back cache may require two memory accesses to the backing store: one to write back the dirty data, and one to retrieve the requested data. Other policies may also trigger data write-back. The client may make many changes to data in the cache, and then explicitly notify the cache to write back the data.

Write operations do not return data. Consequently, a decision needs to be made for write misses: whether or not to load the data into the cache. This is determined by these write-miss policies:

While both write policies can Implement either write-miss policy, they are typically paired as follows: [4] [5]

Entities other than the cache may change the data in the backing store, in which case the copy in the cache may become out-of-date or stale. Alternatively, when the client updates the data in the cache, copies of that data in other caches will become stale. Communication protocols between the cache managers that keep the data consistent are associated with cache coherence.

Prefetch

On a cache read miss, caches with a demand paging policy read the minimum amount from the backing store. A typical demand-paging virtual memory implementation reads one page of virtual memory (often 4 KB) from disk into the disk cache in RAM. A typical CPU reads a single L2 cache line of 128 bytes from DRAM into the L2 cache, and a single L1 cache line of 64 bytes from the L2 cache into the L1 cache.

Caches with a prefetch input queue or more general anticipatory paging policy go further—they not only read the data requested, but guess that the next chunk or two of data will soon be required, and so prefetch that data into the cache ahead of time. Anticipatory paging is especially helpful when the backing store has a long latency to read the first chunk and much shorter times to sequentially read the next few chunks, such as disk storage and DRAM.

A few operating systems go further with a loader that always pre-loads the entire executable into RAM. A few caches go even further, not only pre-loading an entire file, but also starting to load other related files that may soon be requested, such as the page cache associated with a prefetcher or the web cache associated with link prefetching.

Examples of hardware caches

CPU cache

Small memories on or close to the CPU can operate faster than the much larger main memory. [6] Most CPUs since the 1980s have used one or more caches, sometimes in cascaded levels; modern high-end embedded, desktop and server microprocessors may have as many as six types of cache (between levels and functions). [7] Some examples of caches with a specific function are the D-cache, I-cache and the translation lookaside buffer for the memory management unit (MMU).

GPU cache

Earlier graphics processing units (GPUs) often had limited read-only texture caches and used swizzling to improve 2D locality of reference. Cache misses would drastically affect performance, e.g. if mipmapping was not used. Caching was important to leverage 32-bit (and wider) transfers for texture data that was often as little as 4 bits per pixel.

As GPUs advanced, supporting general-purpose computing on graphics processing units and compute kernels, they have developed progressively larger and increasingly general caches, including instruction caches for shaders, exhibiting functionality commonly found in CPU caches. These caches have grown to handle synchronization primitives between threads and atomic operations, and interface with a CPU-style MMU.

DSPs

Digital signal processors have similarly generalized over the years. Earlier designs used scratchpad memory fed by direct memory access, but modern DSPs such as Qualcomm Hexagon often include a very similar set of caches to a CPU (e.g. Modified Harvard architecture with shared L2, split L1 I-cache and D-cache). [8]

Translation lookaside buffer

A memory management unit (MMU) that fetches page table entries from main memory has a specialized cache, used for recording the results of virtual address to physical address translations. This specialized cache is called a translation lookaside buffer (TLB). [9]

In-network cache

Information-centric networking

Information-centric networking (ICN) is an approach to evolve the Internet infrastructure away from a host-centric paradigm, based on perpetual connectivity and the end-to-end principle, to a network architecture in which the focal point is identified information. Due to the inherent caching capability of the nodes in an ICN, it can be viewed as a loosely connected network of caches, which has unique requirements for caching policies. However, ubiquitous content caching introduces the challenge to content protection against unauthorized access, which requires extra care and solutions. [10]

Unlike proxy servers, in ICN the cache is a network-level solution. Therefore, it has rapidly changing cache states and higher request arrival rates; moreover, smaller cache sizes impose different requirements on the content eviction policies. In particular, eviction policies for ICN should be fast and lightweight. Various cache replication and eviction schemes for different ICN architectures and applications have been proposed.[ citation needed ]

Policies

Time aware least recently used

The time aware least recently used (TLRU) is a variant of LRU designed for the situation where the stored contents in cache have a valid lifetime. The algorithm is suitable in network cache applications, such as ICN, content delivery networks (CDNs) and distributed networks in general. TLRU introduces a new term: time to use (TTU). TTU is a time stamp on content which stipulates the usability time for the content based on the locality of the content and information from the content publisher. Owing to this locality-based time stamp, TTU provides more control to the local administrator to regulate in-network storage.

In the TLRU algorithm, when a piece of content arrives, a cache node calculates the local TTU value based on the TTU value assigned by the content publisher. The local TTU value is calculated by using a locally-defined function. Once the local TTU value is calculated the replacement of content is performed on a subset of the total content stored in cache node. The TLRU ensures that less popular and short-lived content should be replaced with incoming content. [11]

Least frequent recently used

The least frequent recently used (LFRU) cache replacement scheme combines the benefits of LFU and LRU schemes. LFRU is suitable for network cache applications, such as ICN, CDNs and distributed networks in general. In LFRU, the cache is divided into two partitions called privileged and unprivileged partitions. The privileged partition can be seen as a protected partition. If content is highly popular, it is pushed into the privileged partition. Replacement of the privileged partition is done by first evicting content from the unprivileged partition, then pushing content from the privileged partition to the unprivileged partition, and finally inserting new content into the privileged partition. In the above procedure, the LRU is used for the privileged partition and an approximated LFU (ALFU) scheme is used for the unprivileged partition. The basic idea is to cache the locally popular content with the ALFU scheme and push the popular content to the privileged partition. [12]

Weather forecast

In 2011, the use of smartphones with weather forecasting options was overly taxing AccuWeather servers; two requests from the same area would generate separate requests. An optimization by edge-servers to truncate the GPS coordinates to fewer decimal places meant that the cached results from a nearby query would be used. The number of to-the-server lookups per day dropped by half. [13]

Software caches

Disk cache

While CPU caches are generally managed entirely by hardware, a variety of software manages other caches. The page cache in main memory is managed by the operating system kernel.

While the disk buffer, which is an integrated part of the hard disk drive or solid state drive, is sometimes misleadingly referred to as disk cache, its main functions are write sequencing and read prefetching. High-end disk controllers often have their own on-board cache for the hard disk drive's data blocks.

Finally, a fast local hard disk drive can also cache information held on even slower data storage devices, such as remote servers (web cache) or local tape drives or optical jukeboxes; such a scheme is the main concept of hierarchical storage management. Also, fast flash-based solid-state drives (SSDs) can be used as caches for slower rotational-media hard disk drives, working together as hybrid drives.

Web cache

Web browsers and web proxy servers employ web caches to store previous responses from web servers, such as web pages and images. Web caches reduce the amount of information that needs to be transmitted across the network, as information previously stored in the cache can often be re-used. This reduces bandwidth and processing requirements of the web server, and helps to improve responsiveness for users of the web. [14]

Web browsers employ a built-in web cache, but some Internet service providers (ISPs) or organizations also use a caching proxy server, which is a web cache that is shared among all users of that network.

Another form of cache is P2P caching, where the files most sought for by peer-to-peer applications are stored in an ISP cache to accelerate P2P transfers. Similarly, decentralised equivalents exist, which allow communities to perform the same task for P2P traffic, for example, Corelli. [15]

Memoization

A cache can store data that is computed on demand rather than retrieved from a backing store. Memoization is an optimization technique that stores the results of resource-consuming function calls within a lookup table, allowing subsequent calls to reuse the stored results and avoid repeated computation. It is related to the dynamic programming algorithm design methodology, which can also be thought of as a means of caching.

Content delivery network

A content delivery network (CDN) is a network of distributed servers that deliver pages and other Web content to a user, based on the geographic locations of the user, the origin of the web page and the content delivery server.

CDNs began in the late 1990s as a way to speed up the delivery of static content, such as HTML pages, images and videos. By replicating content on multiple servers around the world and delivering it to users based on their location, CDNs can significantly improve the speed and availability of a website or application. When a user requests a piece of content, the CDN will check to see if it has a copy of the content in its cache. If it does, the CDN will deliver the content to the user from the cache. [16]

Cloud storage gateway

A cloud storage gateway, also known as an edge filer, is a hybrid cloud storage device that connects a local network to one or more cloud storage services, typically object storage services such as Amazon S3. It provides a cache for frequently accessed data, providing high speed local access to frequently accessed data in the cloud storage service. Cloud storage gateways also provide additional benefits such as accessing cloud object storage through traditional file serving protocols as well as continued access to cached data during connectivity outages. [17]

Other caches

The BIND DNS daemon caches a mapping of domain names to IP addresses, as does a resolver library.

Write-through operation is common when operating over unreliable networks (like an Ethernet LAN), because of the enormous complexity of the coherency protocol required between multiple write-back caches when communication is unreliable. For instance, web page caches and client-side network file system caches (like those in NFS or SMB) are typically read-only or write-through specifically to keep the network protocol simple and reliable.

Search engines also frequently make web pages they have indexed available from their cache. For example, Google provides a "Cached" link next to each search result. This can prove useful when web pages from a web server are temporarily or permanently inaccessible.

Database caching can substantially improve the throughput of database applications, for example in the processing of indexes, data dictionaries, and frequently used subsets of data.

A distributed cache [18] uses networked hosts to provide scalability, reliability and performance to the application. [19] The hosts can be co-located or spread over different geographical regions.

Buffer vs. cache

The semantics of a "buffer" and a "cache" are not totally different; even so, there are fundamental differences in intent between the process of caching and the process of buffering.

Fundamentally, caching realizes a performance increase for transfers of data that is being repeatedly transferred. While a caching system may realize a performance increase upon the initial (typically write) transfer of a data item, this performance increase is due to buffering occurring within the caching system.

With read caches, a data item must have been fetched from its residing location at least once in order for subsequent reads of the data item to realize a performance increase by virtue of being able to be fetched from the cache's (faster) intermediate storage rather than the data's residing location. With write caches, a performance increase of writing a data item may be realized upon the first write of the data item by virtue of the data item immediately being stored in the cache's intermediate storage, deferring the transfer of the data item to its residing storage at a later stage or else occurring as a background process. Contrary to strict buffering, a caching process must adhere to a (potentially distributed) cache coherency protocol in order to maintain consistency between the cache's intermediate storage and the location where the data resides. Buffering, on the other hand,

With typical caching implementations, a data item that is read or written for the first time is effectively being buffered; and in the case of a write, mostly realizing a performance increase for the application from where the write originated. Additionally, the portion of a caching protocol where individual writes are deferred to a batch of writes is a form of buffering. The portion of a caching protocol where individual reads are deferred to a batch of reads is also a form of buffering, although this form may negatively impact the performance of at least the initial reads (even though it may positively impact the performance of the sum of the individual reads). In practice, caching almost always involves some form of buffering, while strict buffering does not involve caching.

A buffer is a temporary memory location that is traditionally used because CPU instructions cannot directly address data stored in peripheral devices. Thus, addressable memory is used as an intermediate stage. Additionally, such a buffer may be feasible when a large block of data is assembled or disassembled (as required by a storage device), or when data may be delivered in a different order than that in which it is produced. Also, a whole buffer of data is usually transferred sequentially (for example to hard disk), so buffering itself sometimes increases transfer performance or reduces the variation or jitter of the transfer's latency as opposed to caching where the intent is to reduce the latency. These benefits are present even if the buffered data are written to the buffer once and read from the buffer once.

A cache also increases transfer performance. A part of the increase similarly comes from the possibility that multiple small transfers will combine into one large block. But the main performance-gain occurs because there is a good chance that the same data will be read from cache multiple times, or that written data will soon be read. A cache's sole purpose is to reduce accesses to the underlying slower storage. Cache is also usually an abstraction layer that is designed to be invisible from the perspective of neighboring layers.

See also

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

<span class="mw-page-title-main">Harvard architecture</span> Computer architecture where code and data each have a separate bus

The Harvard architecture is a computer architecture with separate storage and signal pathways for instructions and data. It is often contrasted with the von Neumann architecture, where program instructions and data share the same memory and pathways. This architecture is often used in real-time processing or low-power applications.

The MESI protocol is an invalidate-based cache coherence protocol, and is one of the most common protocols that support write-back caches. It is also known as the Illinois protocol due to its development at the University of Illinois at Urbana-Champaign. Write back caches can save considerable bandwidth generally wasted on a write through cache. There is always a dirty state present in write-back caches that indicates that the data in the cache is different from that in the main memory. The Illinois Protocol requires a cache-to-cache transfer on a miss if the block resides in another cache. This protocol reduces the number of main memory transactions with respect to the MSI protocol. This marks a significant improvement in performance.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

<span class="mw-page-title-main">Page table</span> Data structure that maps virtual addresses with physical addresses

A page table is a data structure used by a virtual memory system in a computer to store mappings between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, while physical addresses are used by the hardware, or more specifically, by the random-access memory (RAM) subsystem. The page table is a key component of virtual address translation that is necessary to access data in memory. The page table is set up by the computer's operating system, and may be read and written during the virtual address translation process by the memory management unit or by low-level system software or firmware.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

In computing, cache replacement policies are optimizing instructions or algorithms which a computer program or hardware-maintained structure can utilize to manage a cache of information. Caching improves performance by keeping recent or often-used data items in memory locations which are faster, or computationally cheaper to access, than normal memory stores. When the cache is full, the algorithm must choose which items to discard to make room for new data.

<span class="mw-page-title-main">Diskless node</span> Computer workstation operated without disk drives

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

Hierarchical storage management (HSM), also known as tiered storage, is a data storage and data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid-state drive arrays, are more expensive than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

A write buffer is a type of data buffer that can be used to hold data being written from the cache to main memory or to the next cache in the memory hierarchy to improve performance and reduce latency. It is used in certain CPU cache architectures like Intel's x86 and AMD64. In multi-core systems, write buffers destroy sequential consistency. Some software disciplines, like C11's data-race-freedom, are sufficient to regain a sequentially consistent view of memory.

The Network Livermore Timesharing System is an operating system that was actively developed at Lawrence Livermore Laboratory from 1979 until about 1988, though it continued to run production applications until 1995. An earlier system, the Livermore Time Sharing System had been developed over a decade earlier.

<span class="mw-page-title-main">Disk buffer</span>

In computer storage, a disk buffer is the embedded memory in a hard disk drive (HDD) or solid-state drive (SSD) acting as a buffer between the rest of the computer and the physical hard disk platter or flash memory that is used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory, and solid-state drives come with up to 4 GB of cache memory.

In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache in otherwise unused portions of the main memory (RAM), resulting in quicker access to the contents of cached pages and overall performance improvements. A page cache is implemented in kernels with the paging memory management, and is mostly transparent to applications.

libtorrent

libtorrent is an open-source implementation of the BitTorrent protocol. It is written in and has its main library interface in C++. Its most notable features are support for Mainline DHT, IPv6, HTTP seeds and μTorrent's peer exchange. libtorrent uses Boost, specifically Boost.Asio to gain its platform independence. It is known to build on Windows and most Unix-like operating systems.

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

In computing, a distributed cache is an extension of the traditional concept of cache used in a single locale. A distributed cache may span multiple servers so that it can grow in size and in transactional capacity. It is mainly used to store application data residing in database and web session data. The idea of distributed caching has become feasible now because main memory has become very cheap and network cards have become very fast, with 1 Gbit now standard everywhere and 10 Gbit gaining traction. Also, a distributed cache works well on lower cost machines usually employed for web servers as opposed to database servers which require expensive hardware. An emerging internet architecture known as Information-centric networking (ICN) is one of the best examples of a distributed cache network. The ICN is a network level solution hence the existing distributed network cache management schemes are not well suited for ICN. In the supercomputer environment, distributed cache is typically implemented in the form of burst buffer.

Virtual memory compression is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the auxiliary storage. In a virtual memory compression system, pages to be paged out of virtual memory are compressed and stored in physical memory, which is usually random-access memory (RAM), or sent as compressed to auxiliary storage such as a hard disk drive (HDD) or solid-state drive (SSD). In both cases the virtual memory range, whose contents has been compressed, is marked inaccessible so that attempts to access compressed pages can trigger page faults and reversal of the process. The footprint of the data being paged is reduced by the compression process; in the first instance, the freed RAM is returned to the available physical memory pool, while the compressed portion is kept in RAM. In the second instance, the compressed data is sent to auxiliary storage but the resulting I/O operation is smaller and therefore takes less time.

In computing, cache algorithms are optimizing instructions‍—‌or algorithms‍—‌that a computer program or a hardware-maintained structure can follow in order to manage a cache of information stored on the computer. When the cache is full, the algorithm must choose which items to discard to make room for the new ones. Due to the inherent caching capability of nodes in Information-centric networking ICN, the ICN can be viewed as a loosely connect network of caches, which has unique requirements of Caching policies. Unlike proxy servers, in Information-centric networking the cache is a network level solution. Therefore, it has rapidly changing cache states and higher request arrival rates; moreover, smaller cache sizes further impose different kind of requirements on the content eviction policies. In particular, eviction policies for Information-centric networking should be fast and lightweight. Various cache replication and eviction schemes for different Information-centric networking architectures and applications are proposed.

References

  1. "Cache". Oxford Dictionaries. Archived from the original on 18 August 2012. Retrieved 2 August 2016.
  2. Zhong, Liang; Zheng, Xueqian; Liu, Yong; Wang, Mengting; Cao, Yang (February 2020). "Cache hit ratio maximization in device-to-device communications overlaying cellular networks". China Communications. 17 (2): 232–238. doi:10.23919/jcc.2020.02.018. ISSN   1673-5447. S2CID   212649328.
  3. Bottomley, James (1 January 2004). "Understanding Caching". Linux Journal. Retrieved 1 October 2019.
  4. Hennessy, John L.; Patterson, David A. (2011). Computer Architecture: A Quantitative Approach. Elsevier. p. B–12. ISBN   978-0-12-383872-8.
  5. Patterson, David A.; Hennessy, John L. (1990). Computer Architecture A Quantitative Approach. Morgan Kaufmann Publishers. p. 413. ISBN   1-55860-069-8.
  6. Su, Chao; Zeng, Qingkai (10 June 2021). Nicopolitidis, Petros (ed.). "Survey of CPU Cache-Based Side-Channel Attacks: Systematic Analysis, Security Models, and Countermeasures". Security and Communication Networks. 2021: 1–15. doi: 10.1155/2021/5559552 . ISSN   1939-0122.
  7. "Intel Broadwell Core i7 5775C '128MB L4 Cache' Gaming Behemoth and Skylake Core i7 6700K Flagship Processors Finally Available In Retail". 25 September 2015. Mentions L4 cache. Combined with separate I-Cache and TLB, this brings the total 'number of caches (levels+functions) to 6.
  8. "qualcom Hexagon DSP SDK overview".
  9. Frank Uyeda (2009). "Lecture 7: Memory Management" (PDF). CSE 120: Principles of Operating Systems. UC San Diego. Retrieved 4 December 2013.
  10. Bilal, Muhammad; et al. (2019). "Secure Distribution of Protected Content in Information-Centric Networking". IEEE Systems Journal. 14 (2): 1–12. arXiv: 1907.11717 . Bibcode:2020ISysJ..14.1921B. doi:10.1109/JSYST.2019.2931813. S2CID   198967720.
  11. Bilal, Muhammad; Kang, Shin-Gak (2014). Time Aware Least Recent Used (TLRU) cache management policy in ICN. 16th International Conference on Advanced Communication Technology. pp. 528–532. arXiv: 1801.00390 . Bibcode:2018arXiv180100390B. doi:10.1109/ICACT.2014.6779016. ISBN   978-89-968650-3-2. S2CID   830503.
  12. Bilal, Muhammad; et al. (2017). "A Cache Management Scheme for Efficient Content Eviction and Replication in Cache Networks". IEEE Access. 5: 1692–1701. arXiv: 1702.04078 . Bibcode:2017arXiv170204078B. doi:10.1109/ACCESS.2017.2669344. S2CID   14517299.
  13. Murphy, Chris (30 May 2011). "5 Lines Of Code In The Cloud". InformationWeek . p. 28. 300 million to 500 million fewer requests a day handled by AccuWeather servers
  14. Multiple (wiki). "Web application caching". Docforge. Archived from the original on 12 December 2019. Retrieved 24 July 2013.
  15. Tyson, Gareth; Mauthe, Andreas; Kaune, Sebastian; Mu, Mu; Plagemann, Thomas. Corelli: A Dynamic Replication Service for Supporting Latency-Dependent Content in Community Networks (PDF). MMCN'09. Archived from the original (PDF) on 18 June 2015.
  16. "Globally Distributed Content Delivery, by J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman and B. Weihl, IEEE Internet Computing, Volume 6, Issue 5, November 2002" (PDF). Archived (PDF) from the original on 9 August 2017. Retrieved 25 October 2019.
  17. "Definition: cloud storage gateway". SearchStorage. July 2014.
  18. Paul, S.; Fei, Z. (1 February 2001). "Distributed caching with centralized control". Computer Communications. 24 (2): 256–268. CiteSeerX   10.1.1.38.1094 . doi:10.1016/S0140-3664(00)00322-4.
  19. Khan, Iqbal (July 2009). "Distributed Caching on the Path To Scalability". MSDN. 24 (7).

Further reading