Algorithmic efficiency

Last updated

In computer science, algorithmic efficiency is a property of an algorithm which relates to the number of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

Computer science Study of the theoretical foundations of information and computation

Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

Algorithm An unambiguous specification of how to solve a class of problems

In mathematics and computer science, an algorithm is a set of instructions, typically to solve a class of problems or perform a computation. Algorithms are unambiguous specifications for performing calculation, data processing, automated reasoning, and other tasks.

In computational complexity theory, a computational resource is a resource used by some computational models in the solution of computational problems.

Contents

For maximum efficiency we wish to minimize resource usage. However, different resources such as time and space complexity cannot be compared directly, so which of two algorithms is considered to be more efficient often depends on which measure of efficiency is considered most important.

Time complexity An estimate of time taken for running an algorithm

In computer science, the time complexity is the computational complexity that describes the amount of time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to differ by at most a constant factor.

In computer science, the space complexity of an algorithm or a computer program is the amount of memory space required to solve an instance of the computational problem as a function of the size of the input. It is the memory required by an algorithm to execute a program and produce output.

For example, bubble sort and timsort are both algorithms to sort a list of items from smallest to largest. Bubble sort sorts the list in time proportional to the number of elements squared (, see Big O notation), but only requires a small amount of extra memory which is constant with respect to the length of the list (). Timsort sorts the list in time linearithmic (proportional to a quantity times its logarithm) in the list's length (), but has a space requirement linear in the length of the list (). If large lists must be sorted at high speed for a given application, timsort is a better choice; however, if minimizing the memory footprint of the sorting is more important, bubble sort is a better choice.

Bubble sort Simple comparison sorting algorithm

Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted. The algorithm, which is a comparison sort, is named for the way smaller or larger elements "bubble" to the top of the list. Although the algorithm is simple, it is too slow and impractical for most problems even when compared to insertion sort. Bubble sort can be practical if the input is in mostly sorted order with some out-of-order elements nearly in position.

Timsort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It uses techniques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 467–474, January 1993. It was implemented by Tim Peters in 2002 for use in the Python programming language. The algorithm finds subsequences of the data that are already ordered, and uses that knowledge to sort the remainder more efficiently. This is done by merging an identified subsequence, called a run, with existing runs until certain criteria are fulfilled. Timsort has been Python's standard sorting algorithm since version 2.3. It is also used to sort arrays of non-primitive type in Java SE 7, on the Android platform, in GNU Octave, and Google Chrome.

In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most frequently used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output. More formally, the output of any sorting algorithm must satisfy two conditions:

  1. The output is in nondecreasing order ;
  2. The output is a permutation of the input.

Background

The importance of efficiency with respect to time was emphasised by Ada Lovelace in 1843 as applying to Charles Babbage's mechanical analytical engine:

Ada Lovelace English mathematician, computer programming pioneer

Augusta Ada King, Countess of Lovelace was an English mathematician and writer, chiefly known for her work on Charles Babbage's proposed mechanical general-purpose computer, the Analytical Engine. She was the first to recognise that the machine had applications beyond pure calculation, and published the first algorithm intended to be carried out by such a machine. As a result, she is sometimes regarded as the first to recognise the full potential of a "computing machine" and one of the first computer programmers.

Charles Babbage English mathematician, philosopher, and engineer (1791-1871)

Charles Babbage was an English polymath. A mathematician, philosopher, inventor and mechanical engineer, Babbage originated the concept of a digital programmable computer.

"In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selections amongst them for the purposes of a calculating engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation" [1]

Early electronic computers were severely limited both by the speed of operations and the amount of memory available. In some cases it was realized that there was a space–time trade-off, whereby a task could be handled either by using a fast algorithm which used quite a lot of working memory, or by using a slower algorithm which used very little working memory. The engineering trade-off was then to use the fastest algorithm which would fit in the available memory.

Task (computing) computing term; execution path through address space

In computing, a task is a unit of execution or a unit of work. The term is ambiguous; precise alternative terms include process, light-weight process, thread, step, request, or query. In the adjacent diagram, there are queues of incoming work to do and outgoing completed work, and a thread pool of threads to perform this work. Either the work units themselves or the threads that perform the work can be referred to as "tasks", and these can be referred to respectively as requests/responses/threads, incoming tasks/completed tasks/threads, or requests/responses/tasks.

Modern computers are significantly faster than the early computers, and have a much larger amount of memory available (Gigabytes instead of Kilobytes). Nevertheless, Donald Knuth emphasised that efficiency is still an important consideration:

"In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering" [2]

Overview

An algorithm is considered efficient if its resource consumption, also known as computational cost, is at or below some acceptable level. Roughly speaking, 'acceptable' means: it will run in a reasonable amount of time or space on an available computer, typically as a function of the size of the input. Since the 1950s computers have seen dramatic increases in both the available computational power and in the available amount of memory, so current acceptable levels would have been unacceptable even 10 years ago. In fact, thanks to the approximate doubling of computer power every 2 years, tasks that are acceptably efficient on modern smartphones and embedded systems may have been unacceptably inefficient for industrial servers 10 years ago.

Computer manufacturers frequently bring out new models, often with higher performance. Software costs can be quite high, so in some cases the simplest and cheapest way of getting higher performance might be to just buy a faster computer, provided it is compatible with an existing computer.

There are many ways in which the resources used by an algorithm can be measured: the two most common measures are speed and memory usage; other measures could include transmission speed, temporary disk usage, long-term disk usage, power consumption, total cost of ownership, response time to external stimuli, etc. Many of these measures depend on the size of the input to the algorithm, i.e. the amount of data to be processed. They might also depend on the way in which the data is arranged; for example, some sorting algorithms perform poorly on data which is already sorted, or which is sorted in reverse order.

In practice, there are other factors which can affect the efficiency of an algorithm, such as requirements for accuracy and/or reliability. As detailed below, the way in which an algorithm is implemented can also have a significant effect on actual efficiency, though many aspects of this relate to optimization issues.

Theoretical analysis

In the theoretical analysis of algorithms, the normal practice is to estimate their complexity in the asymptotic sense. The most commonly used notation to describe resource consumption or "complexity" is Donald Knuth's Big O notation, representing the complexity of an algorithm as a function of the size of the input . Big O notation is an asymptotic measure of function complexity, where roughly means the time requirement for an algorithm is proportional to , omitting lower-order terms that contribute less than to the growth of the function as grows arbitrarily large. This estimate may be misleading when is small, but is generally sufficiently accurate when is large as the notation is asymptotic. For example, bubble sort may be faster than merge sort when only a few items are to be sorted; however either implementation is likely to meet performance requirements for a small list. Typically, programmers are interested in algorithms that scale efficiently to large input sizes, and merge sort is preferred over bubble sort for lists of length encountered in most data-intensive programs.

Some examples of Big O notation applied to algorithms' asymptotic time complexity include:

NotationNameExamples
constant Finding the median from a sorted list of measurements; Using a constant-size lookup table; Using a suitable hash function for looking up an item.
logarithmic Finding an item in a sorted array with a binary search or a balanced search tree as well as all operations in a Binomial heap.
linear Finding an item in an unsorted list or a malformed tree (worst case) or in an unsorted array; Adding two n-bit integers by ripple carry.
linearithmic, loglinear, or quasilinearPerforming a Fast Fourier transform; heapsort, quicksort (best and average case), or merge sort
quadratic Multiplying two n-digit numbers by a simple algorithm; bubble sort (worst case or naive implementation), Shell sort, quicksort (worst case), selection sort or insertion sort
exponential Finding the optimal (non-approximate) solution to the travelling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute-force search

Benchmarking: measuring performance

For new versions of software or to provide comparisons with competitive systems, benchmarks are sometimes used, which assist with gauging an algorithms relative performance. If a new sort algorithm is produced, for example, it can be compared with its predecessors to ensure that at least it is efficient as before with known data, taking into consideration any functional improvements. Benchmarks can be used by customers when comparing various products from alternative suppliers to estimate which product will best suit their specific requirements in terms of functionality and performance. For example, in the mainframe world certain proprietary sort products from independent software companies such as Syncsort compete with products from the major suppliers such as IBM for speed.

Some benchmarks provide opportunities for producing an analysis comparing the relative speed of various compiled and interpreted languages for example [3] [4] and The Computer Language Benchmarks Game compares the performance of implementations of typical programming problems in several programming languages.

Even creating "do it yourself" benchmarks can demonstrate the relative performance of different programming languages, using a variety of user specified criteria. This is quite simple, as a "Nine language performance roundup" by Christopher W. Cowell-Shah demonstrates by example. [5]

Implementation concerns

Implementation issues can also have an effect on efficiency, such as the choice of programming language, or the way in which the algorithm is actually coded, [6] or the choice of a compiler for a particular language, or the compilation options used, or even the operating system being used. In many cases a language implemented by an interpreter may be much slower than a language implemented by a compiler. [3] See the articles on just-in-time compilation and interpreted languages.

There are other factors which may affect time or space issues, but which may be outside of a programmer's control; these include data alignment, data granularity, cache locality, cache coherency, garbage collection, instruction-level parallelism, multi-threading (at either a hardware or software level), simultaneous multitasking, and subroutine calls. [7]

Some processors have capabilities for vector processing, which allow a single instruction to operate on multiple operands; it may or may not be easy for a programmer or compiler to use these capabilities. Algorithms designed for sequential processing may need to be completely redesigned to make use of parallel processing, or they could be easily reconfigured. As parallel and distributed computing grow in importance in the late 2010's, more investments are being made into efficient high-level APIs for parallel and distributed computing systems such as CUDA, TensorFlow, Hadoop, OpenMP and MPI.

Another problem which can arise in programming is that processors compatible with the same instruction set (such as x86-64 or ARM) may implement an instruction in different ways, so that instructions which are relatively fast on some models may be relatively slow on other models. This often presents challenges to optimizing compilers, which must have a great amount of knowledge of the specific CPU and other hardware available on the compilation target to best optimize a program for performance. In the extreme case, a compiler may be forced to emulate instructions not supported on a compilation target platform, forcing it to generate code or link an external library call to produce a result that is otherwise incomputable on that platform, even if it is natively supported and more efficient in hardware on other platforms. This is often the case in embedded systems with respect to floating-point arithmetic, where small and low-power microcontrollers often lack hardware support for floating-point arithmetic and thus require computationally expensive software routines to produce floating point calculations.

Measures of resource usage

Measures are normally expressed as a function of the size of the input .

The two most common measures are:

For computers whose power is supplied by a battery (e.g. laptops and smartphones), or for very long/large calculations (e.g. supercomputers), other measures of interest are:

As of 2018, power consumption is growing as an important metric for computational tasks of all types and at all scales ranging from embedded Internet of things devices to system-on-chip devices to server farms. This trend is often referred to as green computing.

Less common measures of computational efficiency may also be relevant in some cases:

Time

Theory

Analyze the algorithm, typically using time complexity analysis to get an estimate of the running time as a function of the size of the input data. The result is normally expressed using Big O notation. This is useful for comparing algorithms, especially when a large amount of data is to be processed. More detailed estimates are needed to compare algorithm performance when the amount of data is small, although this is likely to be of less importance. Algorithms which include parallel processing may be more difficult to analyze.

Practice

Use a benchmark to time the use of an algorithm. Many programming languages have an available function which provides CPU time usage. For long-running algorithms the elapsed time could also be of interest. Results should generally be averaged over several tests.

Run-based profiling can be very sensitive to hardware configuration and the possibility of other programs or tasks running at the same time in a multi-processing and multi-programming environment.

This sort of test also depends heavily on the selection of a particular programming language, compiler, and compiler options, so algorithms being compared must all be implemented under the same conditions.

Space

This section is concerned with the use of memory resources (registers, cache, RAM, virtual memory, secondary memory) while the algorithm is being executed. As for time analysis above, analyze the algorithm, typically using space complexity analysis to get an estimate of the run-time memory needed as a function as the size of the input data. The result is normally expressed using Big O notation.

There are up to four aspects of memory usage to consider:

Early electronic computers, and early home computers, had relatively small amounts of working memory. For example, the 1949 Electronic Delay Storage Automatic Calculator (EDSAC) had a maximum working memory of 1024 17-bit words, while the 1980 Sinclair ZX80 came initially with 1024 8-bit bytes of working memory. In the late 2010s, it is typical for personal computers to have between 4 and 32 GB of RAM, an increase of over 300 million times as much memory.

Caching and memory hierarchy

Current computers can have relatively large amounts of memory (possibly Gigabytes), so having to squeeze an algorithm into a confined amount of memory is much less of a problem than it used to be. But the presence of four different categories of memory can be significant:

  • Processor registers, the fastest of computer memory technologies with the least amount of storage space. Most direct computation on modern computers occurs with source and destination operands in registers before being updated to the cache, main memory and virtual memory if needed. On a processor core, there are typically on the order of hundreds of bytes or fewer of register availability, although a register file may contain more physical registers than architectural registers defined in the instruction set architecture.
  • Cache memory is the second fastest and second smallest memory available in the memory hierarchy. Caches are present in CPUs, GPUs, hard disk drives and external peripherals, and are typically implemented in static RAM. Memory caches are multi-leveled; lower levels are larger, slower and typically shared between processor cores in multi-core processors. In order to process operands in cache memory, a processing unit must fetch the data from the cache, perform the operation in registers and write the data back to the cache. This operates at speeds comparable (about 2-10 times slower) with the CPU or GPU's arithmetic logic unit or floating-point unit if in the L1 cache. [8] It is about 10 times slower if there is an L1 cache miss and it must be retrieved from and written to the L2 cache, and a further 10 times slower if there is an L2 cache miss and it must be retrieved from an L3 cache, if present.
  • Main physical memory is most often implemented in dynamic RAM (DRAM). The main memory is much larger (typically gigabytes compared to ≈8 megabytes) than an L3 CPU cache, with read and write latencies typically 10-100 times slower. [8] As of 2018, RAM is increasingly implemented on-chip of processors, as CPU or GPU memory.
  • Virtual memory is most often implemented in terms of secondary storage such as a hard disk, and is an extension to the memory hierarchy that has much larger storage space but much larger latency, typically around 1000 times slower than a cache miss for a value in RAM. [8] While originally motivated to create the impression of higher amounts of memory being available than were truly available, virtual memory is more important in contemporary usage for its time-space tradeoff and enabling the usage of virtual machines. [8] Cache misses from main memory are called page faults, and incur huge performance penalties on programs.

An algorithm whose memory needs will fit in cache memory will be much faster than an algorithm which fits in main memory, which in turn will be very much faster than an algorithm which has to resort to virtual memory. Because of this, cache replacement policies are extremely important to high-performance computing, as are cache-aware programming and data alignment. To further complicate the issue, some systems have up to three levels of cache memory, with varying effective speeds. Different systems will have different amounts of these various types of memory, so the effect of algorithm memory needs can vary greatly from one system to another.

In the early days of electronic computing, if an algorithm and its data wouldn't fit in main memory then the algorithm couldn't be used. Nowadays the use of virtual memory appears to provide lots of memory, but at the cost of performance. If an algorithm and its data will fit in cache memory, then very high speed can be obtained; in this case minimizing space will also help minimize time. This is called the principle of locality, and can be subdivided into locality of reference, spatial locality and temporal locality. An algorithm which will not fit completely in cache memory but which exhibits locality of reference may perform reasonably well.

Criticism of the current state of programming

Software efficiency halves every 18 months, compensating Moore's Law

May goes on to state:

In ubiquitous systems, halving the instructions executed can double the battery life and big data sets bring big opportunities for better software and algorithms: Reducing the number of operations from N x N to N x log(N) has a dramatic effect when N is large ... for N = 30 billion, this change is as good as 50 years of technology improvements.

Competitions for the best algorithms

The following competitions invite entries for the best algorithms based on some arbitrary criteria decided by the judges:

See also

Related Research Articles

Analysis of algorithms study of time and space resources used to execute a computational process

In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms – the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that relates the length of an algorithm's input to the number of steps it takes or the number of storage locations it uses. An algorithm is said to be efficient when this function's values are small, or grow slowly compared to a growth in the size of the input. Different inputs of the same length may cause the algorithm to have different behavior, so best, worst and average case descriptions might all be of practical interest. When not otherwise specified, the function describing the performance of an algorithm is usually an upper bound, determined from the worst case inputs to the algorithm.

Binary search algorithm Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer science, the computational complexity, or simply complexity of an algorithm is the amount of resources required for running it. The computational complexity of a problem is the minimum of the complexities of all possible algorithms for this problem.

Merge sort A divide and combine sorting algorithm

In computer science, merge sort is an efficient, general-purpose, comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same in the input and output. Merge sort is a divide and conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up mergesort appeared in a report by Goldstine and von Neumann as early as 1948.

An abstract machine, also called an abstract computer, is a theoretical model of a computer hardware or software system used in automata theory. Abstraction of computing processes is used in both the computer science and computer engineering disciplines and usually assumes a discrete time paradigm.

In computer science, divide and conquer is an algorithm design paradigm based on multi-branched recursion. A divide-and-conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

Judy array data structure implementing a type of associative array with high performance and low memory usage

In computer science, a Judy array is a data structure implementing a type of associative array with high performance and low memory usage. Unlike most other key-value stores, Judy arrays use no hashing, leverage compression on their keys, and can efficiently represent sparse data, that is, they may have large ranges of unassigned indices without greatly increasing memory usage or processing time. They are designed to remain efficient even on structures with sizes in the peta-element range, with performance scaling on the order of O(log256n). Roughly speaking, Judy arrays are highly optimized 256-ary radix trees.

A space–time or time–memory trade-off in computer science is a case where an algorithm or program trades increased space usage with decreased time. Here, space refers to the data storage consumed in performing a given task, and time refers to the time consumed in performing a given task.

External sorting is a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device and instead they must reside in the slower external memory, usually a hard disk drive. Thus, external sorting algorithms are external memory algorithms and thus applicable in the external memory model of computation.

In computing, a cache-oblivious algorithm is an algorithm designed to take advantage of a CPU cache without having the size of the cache as an explicit parameter. An optimal cache-oblivious algorithm is a cache-oblivious algorithm that uses the cache optimally. Thus, a cache-oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes. Cache-oblivious algorithms are contrasted with explicit blocking, as in loop nest optimization, which explicitly breaks a problem into blocks that are optimally sized for a given cache.

In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that is too large to fit into a computer's main memory at one time. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory such as hard drives or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model.

In computing, hardware acceleration is the use of computer hardware specially made to perform some functions more efficiently than is possible in software running on a general-purpose CPU. Any transformation of data or routine that can be computed, can be calculated purely in software running on a generic CPU, purely in custom-made hardware, or in some mix of both. An operation can be computed faster in application-specific hardware designed or programmed to compute the operation than specified in software and performed on a general-purpose computer processor. Each approach has advantages and disadvantages. The implementation of computing tasks in hardware to decrease latency and increase throughput is known as hardware acceleration.

In computational number theory, a variety of algorithms make it possible to generate prime numbers efficiently. These are used in various applications, for example hashing, public-key cryptography, and search of prime factors in large numbers.

In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm. It is a term commonly encountered in computer science research as a result of widespread use of big-O notation.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

Flashsort is a distribution sorting algorithm showing linear computational complexity for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.

A sorted array is an array data structure in which each element is sorted in numerical, alphabetical, or some other order, and placed at equally spaced addresses in computer memory. It is typically used in computer science to implement static lookup tables to hold multiple values which have the same data type. Sorting an array is useful in organising data in ordered form and recovering them rapidly.

Sun-Ni's Law, is a memory-bounded speedup model which states that as computing power increases the corresponding increase in problem size is constrained by the system’s memory capacity. In general, as a system grows in computational power, the problems run on the system increase in size. Analogous to Amdahl's law, which says that the problem size remains constant as system sizes grow, and Gustafson's law, which proposes that the problem size should scale but be bound by a fixed amount of time, Sun-Ni's Law states the problem size should scale but be bound by the memory capacity of the system. Sun-Ni's Law was initially proposed by Xian-He Sun and Lionel Ni at the Proceedings of IEEE Supercomputing Conference 1990.

References

  1. Green, Christopher, Classics in the History of Psychology , retrieved 19 May 2013
  2. Knuth, Donald (1974), "Structured Programming with go-to Statements" (PDF), Computing Surveys, 6 (4): 261–301, CiteSeerX   10.1.1.103.6084 , doi:10.1145/356635.356640, archived from the original (PDF) on 24 August 2009, retrieved 19 May 2013
  3. 1 2 "Floating Point Benchmark: Comparing Languages (Fourmilog: None Dare Call It Reason)". Fourmilab.ch. 4 August 2005. Retrieved 14 December 2011.
  4. "Whetstone Benchmark History". Roylongbottom.org.uk. Retrieved 14 December 2011.
  5. Staff, OSNews. "Nine Language Performance Round-up: Benchmarking Math & File I/O". www.osnews.com. Retrieved 18 September 2018.
  6. Kriegel, Hans-Peter; Schubert, Erich; Zimek, Arthur (2016). "The (black) art of runtime evaluation: Are we comparing algorithms or implementations?". Knowledge and Information Systems. 52 (2): 341–378. doi:10.1007/s10115-016-1004-2. ISSN   0219-1377.
  7. Guy Lewis Steele, Jr. "Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO". MIT AI Lab. AI Lab Memo AIM-443. October 1977.
  8. 1 2 3 4 Hennessy, John L; Patterson, David A; Asanović, Krste; Bakos, Jason D; Colwell, Robert P; Bhattacharjee, Abhishek; Conte, Thomas M; Duato, José; Franklin, Diana; Goldberg, David; Jouppi, Norman P; Li, Sheng; Muralimanohar, Naveen; Peterson, Gregory D; Pinkston, Timothy Mark; Ranganathan, Prakash; Wood, David Allen; Young, Clifford; Zaky, Amr (2011). Computer Architecture: a Quantitative Approach (Sixth ed.). ISBN   978-0128119051. OCLC   983459758.
  9. "Archived copy" (PDF). Archived from the original (PDF) on 3 March 2016. Retrieved 23 February 2009.CS1 maint: archived copy as title (link)
  10. "The Failure of the Digital Computer".
  11. Fagone, Jason (29 November 2010). "Teen Mathletes Do Battle at Algorithm Olympics". Wired.