In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources. The notion of speedup was established by Amdahl's law, which was particularly focused on parallel processing. However, speedup can be used more generally to show the effect on performance after any resource enhancement.
Speedup can be defined for two different types of quantities: latency and throughput . [1]
Latency of an architecture is the reciprocal of the execution speed of a task:
where
Throughput of an architecture is the execution rate of a task:
where
Latency is often measured in seconds per unit of execution workload. Throughput is often measured in units of execution workload per second. Another unit of throughput is instructions per cycle (IPC) and its reciprocal, cycles per instruction (CPI), is another unit of latency.
Speedup is dimensionless and defined differently for each type of quantity so that it is a consistent metric.
Speedup in latency is defined by the following formula: [2]
where
Speedup in latency can be predicted from Amdahl's law or Gustafson's law.
Speedup in throughput is defined by the formula: [3]
where
We are testing the effectiveness of a branch predictor on the execution of a program. First, we execute the program with the standard branch predictor on the processor, which yields an execution time of 2.25 seconds. Next, we execute the program with our modified (and hopefully improved) branch predictor on the same processor, which produces an execution time of 1.50 seconds. In both cases the execution workload is the same. Using our speedup formula, we know
Our new branch predictor has provided a 1.5x speedup over the original.
We can also measure speedup in cycles per instruction (CPI) which is a latency. First, we execute the program with the standard branch predictor, which yields a CPI of 3. Next, we execute the program with our modified branch predictor, which yields a CPI of 2. In both cases the execution workload is the same and both architectures are not pipelined nor parallel. Using the speedup formula gives
We can also measure speedup in instructions per cycle (IPC), which is a throughput and the inverse of CPI. Using the speedup formula gives
We achieve the same 1.5x speedup, though we measured different quantities.
Let S be the speedup of execution of a task and s the speedup of execution of the part of the task that benefits from the improvement of the resources of an architecture. Linear speedup or ideal speedup is obtained when S = s. When running a task with linear speedup, doubling the local speedup doubles the overall speedup. As this is ideal, it is considered very good scalability.
Efficiency is a metric of the utilization of the resources of the improved system defined as
Its value is typically between 0 and 1. Programs with linear speedup and programs running on a single processor have an efficiency of 1, while many difficult-to-parallelize programs have efficiency such as 1/ln(s)[ citation needed ] that approaches 0 as the number of processors A = s increases.
In engineering contexts, efficiency curves are more often used for graphs than speedup curves, since
In marketing contexts, speedup curves are more often used, largely because they go up and to the right and thus appear better to the less-informed.
Sometimes a speedup of more than A when using A processors is observed in parallel computing, which is called super-linear speedup. Super-linear speedup rarely happens and often confuses beginners, who believe the theoretical maximum speedup should be A when A processors are used.
One possible reason for super-linear speedup in low-level computations is the cache effect resulting from the different memory hierarchies of a modern computer: in parallel computing, not only do the numbers of processors change, but so does the size of accumulated caches from different processors. With the larger accumulated cache size, more or even all of the working set can fit into caches and the memory access time reduces dramatically, which causes the extra speedup in addition to that from the actual computation. [4]
An analogous situation occurs when searching large datasets, such as the genomic data searched by BLAST implementations. There the accumulated RAM from each of the nodes in a cluster enables the dataset to move from disk into RAM thereby drastically reducing the time required by e.g. mpiBLAST to search it. [5]
Super-linear speedups can also occur when performing backtracking in parallel: an exception in one thread can cause several other threads to backtrack early, before they reach the exception themselves. [6]
Super-linear speedups can also occur in parallel implementations of branch-and-bound for optimization: [7] the processing of one node by one processor may affect the work other processors need to do for the other nodes.
In computer architecture, Amdahl's law is a formula that shows how much faster a task can be completed when you add more resources to the system.
Brownian motion is the random motion of particles suspended in a medium.
The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances. They were named after French engineer and physicist Claude-Louis Navier and the Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).
In viscous fluid dynamics, the Archimedes number (Ar), is a dimensionless number used to determine the motion of fluids due to density differences, named after the ancient Greek scientist and mathematician Archimedes.
In thermodynamics, the Onsager reciprocal relations express the equality of certain ratios between flows and forces in thermodynamic systems out of equilibrium, but where a notion of local equilibrium exists.
The representation theory of groups is a part of mathematics which examines how groups act on given structures.
In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how much a model probability distribution Q is different from a true probability distribution P. Mathematically, it is defined as
The Rayleigh–Taylor instability, or RT instability, is an instability of an interface between two fluids of different densities which occurs when the lighter fluid is pushing the heavier fluid. Examples include the behavior of water suspended above oil in the gravity of Earth, mushroom clouds like those from volcanic eruptions and atmospheric nuclear explosions, supernova explosions in which expanding core gas is accelerated into denser shell gas, instabilities in plasma fusion reactors and inertial confinement fusion.
In electromagnetism, charge density is the amount of electric charge per unit length, surface area, or volume. Volume charge density is the quantity of charge per unit volume, measured in the SI system in coulombs per cubic meter (C⋅m−3), at any point in a volume. Surface charge density (σ) is the quantity of charge per unit area, measured in coulombs per square meter (C⋅m−2), at any point on a surface charge distribution on a two dimensional surface. Linear charge density (λ) is the quantity of charge per unit length, measured in coulombs per meter (C⋅m−1), at any point on a line charge distribution. Charge density can be either positive or negative, since electric charge can be either positive or negative.
In computer architecture, Gustafson's law gives the speedup in the execution time of a task that theoretically gains from parallel computing, using a hypothetical run of the task on a single-core machine as the baseline. To put it another way, it is the theoretical "slowdown" of an already parallelized task if running on a serial machine. It is named after computer scientist John L. Gustafson and his colleague Edwin H. Barsis, and was presented in the article Reevaluating Amdahl's Law in 1988.
In quantum information theory, quantum relative entropy is a measure of distinguishability between two quantum states. It is the quantum mechanical analog of relative entropy.
An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction, to generate lower-dimensional embeddings for subsequent use by other machine learning algorithms.
In atmospheric thermodynamics, the virtual temperature of a moist air parcel is the temperature at which a theoretical dry air parcel would have a total pressure and density equal to the moist parcel of air. The virtual temperature of unsaturated moist air is always greater than the absolute air temperature, however, as the existence of suspended cloud droplets reduces the virtual temperature.
In queueing theory, a discipline within the mathematical theory of probability, the Pollaczek–Khinchine formula states a relationship between the queue length and service time distribution Laplace transforms for an M/G/1 queue. The term is also used to refer to the relationships between the mean queue length and mean waiting/service time in such a model.
Within theoretical computer science, the Sun–Ni law is a memory-bounded speedup model which states that as computing power increases the corresponding increase in problem size is constrained by the system’s memory capacity. In general, as a system grows in computational power, the problems run on the system increase in size. Analogous to Amdahl's law, which says that the problem size remains constant as system sizes grow, and Gustafson's law, which proposes that the problem size should scale but be bound by a fixed amount of time, the Sun–Ni law states the problem size should scale but be bound by the memory capacity of the system. Sun–Ni law was initially proposed by Xian-He Sun and Lionel Ni at the Proceedings of IEEE Supercomputing Conference 1990.
In queueing theory, a discipline within the mathematical theory of probability, the M/M/c queue is a multi-server queueing model. In Kendall's notation it describes a system where arrivals form a single queue and are governed by a Poisson process, there are c servers, and job service times are exponentially distributed. It is a generalisation of the M/M/1 queue which considers only a single server. The model with infinitely many servers is the M/M/∞ queue.
In fluid dynamics, the Reynolds number is a dimensionless quantity that helps predict fluid flow patterns in different situations by measuring the ratio between inertial and viscous forces. At low Reynolds numbers, flows tend to be dominated by laminar (sheet-like) flow, while at high Reynolds numbers, flows tend to be turbulent. The turbulence results from differences in the fluid's speed and direction, which may sometimes intersect or even move counter to the overall direction of the flow. These eddy currents begin to churn the flow, using up energy in the process, which for liquids increases the chances of cavitation.
An electric dipole transition is the dominant effect of an interaction of an electron in an atom with the electromagnetic field.
Sea ice is a complex composite composed primarily of pure ice in various states of crystallization, but including air bubbles and pockets of brine. Understanding its growth processes is important for climate modellers and remote sensing specialists, since the composition and microstructural properties of the ice affect how it reflects or absorbs sunlight.
In computing and communication systems, a work-conserving scheduler is a scheduler that always tries to keep the scheduled resource(s) busy, if there are submitted jobs ready to be scheduled. In contrast, a non-work conserving scheduler is a scheduler that, in some cases, may leave the scheduled resource(s) idle despite the presence of jobs ready to be scheduled.