Memory-level parallelism

Last updated July 31, 2025

In computer architecture, memory-level parallelism (MLP) is the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time.

In a single processor, MLP may be considered a form of instruction-level parallelism (ILP). However, ILP is often conflated with superscalar, the ability to execute more than one instruction at the same time, e.g. a processor such as the Intel Pentium Pro is five-way superscalar, with the ability to start executing five different microinstructions in a given cycle, but it can handle four different cache misses for up to 20 different load microinstructions at any time.

It is possible to have a machine that is not superscalar but which nevertheless has high MLP.

Arguably a machine that has no ILP, which is not superscalar, which executes one instruction at a time in a non-pipelined manner, but which performs hardware prefetching (not software instruction-level prefetching) exhibits MLP (due to multiple prefetches outstanding) but not ILP. This is because there are multiple memory operations outstanding, but not instructions. Instructions are often conflated with operations.

Furthermore, multiprocessor and multithreaded computer systems may be said to exhibit MLP and ILP due to parallelism—but not intra-thread, single process, ILP and MLP. Often, however, we restrict the terms MLP and ILP to refer to extracting such parallelism from what appears to be non-parallel single threaded code.

References

Glew, A. (1998). "MLP yes! ILP no!". ASPLOS Wild and Crazy Idea Session '98. Wild and Crazy Ideas (WACI) I. ASPLOS VIII. (abstract / slides)
Ronen, R.; Mendelson, A.; Lai, K.; Shih-Lien Lu; Pollack, F.; Shen, J. P. (2001). "Coming challenges in microarchitecture and architecture". Proc. IEEE . 89 (3): 325–340. CiteSeerX 10.1.1.136.5349 . doi:10.1109/5.915377.
Zhou, H.; Conte, T. M. (2003). "Enhancing memory level parallelism via recovery-free value prediction". Proceedings of the 17th annual international conference on Supercomputing. ICS'03. pp. 326–335. CiteSeerX 10.1.1.14.4405 . doi:10.1145/782814.782859. ISBN 1-58113-733-8.
Yuan Chou; Fahs, B.; Abraham, S. (2004). "Microarchitecture optimizations for exploiting memory-level parallelism". Proceedings. 31st Annual International Symposium on Computer Architecture, 2004. ISCA'04. pp. 76–87. CiteSeerX 10.1.1.534.6032 . doi:10.1109/ISCA.2004.1310765. ISBN 0-7695-2143-6.
Qureshi, M. K.; Lynch, D. N.; Mutlu, O.; Patt, Y. N. (2006). "A Case for MLP-Aware Cache Replacement". 33rd International Symposium on Computer Architecture. ISCA'06. pp. 167–178. CiteSeerX 10.1.1.94.4663 . doi:10.1109/ISCA.2006.5. ISBN 0-7695-2608-X.
Van Craeynest, K.; Eyerman, S.; Eeckhout, L. (2009). "MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor". High Performance Embedded Architectures and Compilers. HiPEAC 2009. LNCS. Vol. 5409. pp. 110–124. CiteSeerX 10.1.1.214.3261 . doi:10.1007/978-3-540-92990-1_10. ISBN 978-3-540-92989-5.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

v t e Parallel computing
General	Distributed computing Parallel computing Parallel algorithm Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

Memory-level parallelism

See also

References