Power law of cache misses

Last updated

A power law is a mathematical relationship between two quantities in which one is directly proportional to some power of the other. The power law for cache misses was first established by C. K. Chow in his 1974 paper, [1] supported by experimental data on hit ratios for stack processing by Richard Mattson in 1971. [2] The power law of cache misses can be used to narrow down the cache sizes to practical ranges, given a tolerable miss rate, as one of the early steps while designing the cache hierarchy for a uniprocessor system. [3]

Contents

The power law for cache misses can be stated as

where M is the miss rate for a cache of size C and M0 is the miss rate of a baseline cache. The exponent α is workload-specific and typically ranges from 0.3 to 0.7. [4]

Caveats

The power law can only give an estimate of the miss rate only up to a certain value of cache size. A large enough cache eliminates capacity misses and increasing the cache size further will not reduce the miss rate any further, contrary to the power law's prediction. [3]

The validity of the power law of cache misses also depends on the size of the working memory set in a given process and also on the temporal re-reference pattern of cache blocks in a process. If a process has a small working memory set relative to the cache size, capacity misses are unlikely and the power law does not hold.

Although conflict misses reduce as associativity increases, Hartstein et al. [4] showed that the power law holds irrespective of set associativity.

Hartstein et al. plotted the number of cache block re-accesses versus their re-reference times for a large number of workloads and found that most also follow an exponential relationship. [4]

where R(t) is the rate of re-referencing. It was found that the exponent β ranged between 1.7 and 1.3. Theoretically, it was proved that the power laws of cache re-reference and cache miss rate are related by the equation . This means that for workloads that do not follow the re-reference power law, the power law of cache misses does not hold true.

Multilevel cache hierarchy

In a multilevel cache hierarchy, the miss pattern of the higher level cache becomes the re-reference pattern of the immediate lower level cache. Hartstein et al. [4] found that whereas the cache misses for lower levels do not follow a strict power law, as long as the lower level cache is considerably larger than the higher level cache, the miss rate function can be approximated to the power law.

See also

Related Research Articles

<span class="mw-page-title-main">Power law</span> Functional relationship between two quantities

In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to a power of the change, independent of the initial size of those quantities: one quantity varies as a power of another. For instance, considering the area of a square in terms of the length of its side, if the length is doubled, the area is multiplied by a factor of four.

<span class="mw-page-title-main">Pareto distribution</span> Probability distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locality – temporal and spatial locality. Temporal locality refers to the reuse of specific data and/or resources within a relatively small time duration. Spatial locality refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as traversing the elements in a one-dimensional array.

<span class="mw-page-title-main">Memory hierarchy</span> Computer memory architecture

In computer organisation, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

<span class="mw-page-title-main">Aircraft flight dynamics</span> Science of air vehicle orientation and control in three dimensions

Flight dynamics is the science of air vehicle orientation and control in three dimensions. The three critical flight dynamics parameters are the angles of rotation in three dimensions about the vehicle's center of gravity (cg), known as pitch, roll and yaw. These are collectively known as aircraft attitude, often principally relative to the atmospheric frame in normal flight, but also relative to terrain during takeoff or landing, or when operating at low elevation. The concept of attitude is not specific to fixed-wing aircraft, but also extends to rotary aircraft such as helicopters, and dirigibles, where the flight dynamics involved in establishing and controlling attitude are entirely different.

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. It is a part of the chip's memory-management unit (MMU). A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

In statistics, the Neyman–Pearson lemma was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933. The Neyman-Pearson lemma is part of the Neyman-Pearson theory of statistical testing, which introduced concepts like errors of the second kind, power function, and inductive behavior. The previous Fisherian theory of significance testing postulated only one hypothesis. By introducing a competing hypothesis, the Neyman-Pearsonian flavor of statistical testing allows investigating the two types of errors. The trivial cases where one always rejects or accepts the null hypothesis are of little interest but it does prove that one must not relinquish control over one type of error while calibrating the other. Neyman and Pearson accordingly proceeded to restrict their attention to the class of all level tests while subsequently minimizing type II error, traditionally denoted by . Their seminal paper of 1933, including the Neyman-Pearson lemma, comes at the end of this endeavor, not only showing the existence of tests with the most power that retain a prespecified level of type I error, but also providing a way to construct such tests. The Karlin-Rubin theorem extends the Neyman-Pearson lemma to settings involving composite hypotheses with monotone likelihood ratios.

Service level measures the performance of a system. Certain goals are defined and the service level gives the percentage to which those goals should be achieved. Fill rate is different from service level.

In mathematics, a real or complex-valued function f on d-dimensional Euclidean space satisfies a Hölder condition, or is Hölder continuous, when there are real constants C ≥ 0, α > 0, such that

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

The wind profile power law is a relationship between the wind speeds at one height, and those at another.

The Hurst exponent is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases. Studies involving the Hurst exponent were originally developed in hydrology for the practical matter of determining optimum dam sizing for the Nile river's volatile rain and drought conditions that had been observed over a long period of time. The name "Hurst exponent", or "Hurst coefficient", derives from Harold Edwin Hurst (1880–1978), who was the lead researcher in these studies; the use of the standard notation H for the coefficient also relates to his name.

In stochastic processes, chaos theory and time series analysis, detrended fluctuation analysis (DFA) is a method for determining the statistical self-affinity of a signal. It is useful for analysing time series that appear to be long-memory processes or 1/f noise.

Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations.

Taylor's power law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship. It is named after the ecologist who first proposed it in 1961, Lionel Roy Taylor (1924–2007). Taylor's original name for this relationship was the law of the mean. The name Taylor's law was coined by Southwood in 1966.

Cache placement policies are policies that determine where a particular memory block can be placed when it goes into a CPU cache. A block of memory cannot necessarily be placed at an arbitrary location in the cache; it may be restricted to a particular cache line or a set of cache lines by the cache's placement policy.

<span class="mw-page-title-main">Cache hierarchy</span> Memory hierarchy concept applied to CPU caches with multiple levels

Cache hierarchy, or multi-level caches, refers to a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access by central processing unit (CPU) cores.

A CPU cache is a piece of hardware that reduces access time to data in memory by keeping some part of the frequently used data of the main memory in a 'cache' of smaller and faster memory.

A victim cache is a small, usually fully associative cache placed in the refill path of a CPU cache that stores all the blocks evicted from that level of cache, originally proposed in 1990. In modern architectures, this function is typically performed by Level 3 or Level 4 caches.

References

  1. Chow, C. K. (May 1974). "On Optimization of Storage Hierarchies". IBM Journal of Research and Development. 18 (3): 194–203. doi:10.1147/rd.183.0194.
  2. Mattson, R. (December 1971). "Evaluation of multilevel memories". IEEE Transactions on Magnetics. 7 (4): 814–819. Bibcode:1971ITM.....7..814M. doi:10.1109/TMAG.1971.1067237.
  3. 1 2 Solihin, Yan (17 November 2015). Fundamentals of Parallel Multicore Architecture. Chapman & Hall. ISBN   978-1482211184.
  4. 1 2 3 4 Hartstein, A.; Srinivasan, V.; Puzak, T. R.; Emma, P. G. (2006-01-01). "Cache miss behavior". Proceedings of the 3rd conference on Computing frontiers. CF '06. New York, NY, USA: ACM. pp. 313–320. doi:10.1145/1128022.1128064. ISBN   1595933026. S2CID   17728397.