Divide-and-conquer algorithm

Last updated

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

Contents

The divide-and-conquer technique is the basis of efficient algorithms for many problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers (e.g., the Karatsuba algorithm), finding the closest pair of points, syntactic analysis (e.g., top-down parsers), and computing the discrete Fourier transform (FFT). [1]

Designing efficient divide-and-conquer algorithms can be difficult. As in mathematical induction, it is often necessary to generalize the problem to make it amenable to a recursive solution. The correctness of a divide-and-conquer algorithm is usually proved by mathematical induction, and its computational cost is often determined by solving recurrence relations.

Divide and conquer

Divide-and-conquer approach to sort the list (38, 27, 43, 3, 9, 82, 10) in increasing order. Upper half: splitting into sublists; mid: a one-element list is trivially sorted; lower half: composing sorted sublists. Merge sort algorithm diagram.svg
Divide-and-conquer approach to sort the list (38, 27, 43, 3, 9, 82, 10) in increasing order. Upper half: splitting into sublists; mid: a one-element list is trivially sorted; lower half: composing sorted sublists.

The divide-and-conquer paradigm is often used to find an optimal solution of a problem. Its basic idea is to decompose a given problem into two or more similar, but simpler, subproblems, to solve them in turn, and to compose their solutions to solve the given problem. Problems of sufficient simplicity are solved directly. For example, to sort a given list of n natural numbers, split it into two lists of about n/2 numbers each, sort each of them in turn, and interleave both results appropriately to obtain the sorted version of the given list (see the picture). This approach is known as the merge sort algorithm.

The name "divide and conquer" is sometimes applied to algorithms that reduce each problem to only one sub-problem, such as the binary search algorithm for finding a record in a sorted list (or its analogue in numerical computing, the bisection algorithm for root finding). [2] These algorithms can be implemented more efficiently than general divide-and-conquer algorithms; in particular, if they use tail recursion, they can be converted into simple loops. Under this broad definition, however, every algorithm that uses recursion or loops could be regarded as a "divide-and-conquer algorithm". Therefore, some authors consider that the name "divide and conquer" should be used only when each problem may generate two or more subproblems. [3] The name decrease and conquer has been proposed instead for the single-subproblem class. [4]

An important application of divide and conquer is in optimization,[ example needed ] where if the search space is reduced ("pruned") by a constant factor at each step, the overall algorithm has the same asymptotic complexity as the pruning step, with the constant depending on the pruning factor (by summing the geometric series); this is known as prune and search.

Early historical examples

Early examples of these algorithms are primarily decrease and conquer – the original problem is successively broken down into single subproblems, and indeed can be solved iteratively.

Binary search, a decrease-and-conquer algorithm where the subproblems are of roughly half the original size, has a long history. While a clear description of the algorithm on computers appeared in 1946 in an article by John Mauchly, the idea of using a sorted list of items to facilitate searching dates back at least as far as Babylonia in 200 BC. [5] Another ancient decrease-and-conquer algorithm is the Euclidean algorithm to compute the greatest common divisor of two numbers by reducing the numbers to smaller and smaller equivalent subproblems, which dates to several centuries BC.

An early example of a divide-and-conquer algorithm with multiple subproblems is Gauss's 1805 description of what is now called the Cooley–Tukey fast Fourier transform (FFT) algorithm, [6] although he did not analyze its operation count quantitatively, and FFTs did not become widespread until they were rediscovered over a century later.

An early two-subproblem D&C algorithm that was specifically developed for computers and properly analyzed is the merge sort algorithm, invented by John von Neumann in 1945. [7]

Another notable example is the algorithm invented by Anatolii A. Karatsuba in 1960 [8] that could multiply two n-digit numbers in operations (in Big O notation). This algorithm disproved Andrey Kolmogorov's 1956 conjecture that operations would be required for that task.

As another example of a divide-and-conquer algorithm that did not originally involve computers, Donald Knuth gives the method a post office typically uses to route mail: letters are sorted into separate bags for different geographical areas, each of these bags is itself sorted into batches for smaller sub-regions, and so on until they are delivered. [5] This is related to a radix sort, described for punch-card sorting machines as early as 1929. [5]

Advantages

Solving difficult problems

Divide and conquer is a powerful tool for solving conceptually difficult problems: all it requires is a way of breaking the problem into sub-problems, of solving the trivial cases, and of combining sub-problems to the original problem. Similarly, decrease and conquer only requires reducing the problem to a single smaller problem, such as the classic Tower of Hanoi puzzle, which reduces moving a tower of height to move a tower of height .

Algorithm efficiency

The divide-and-conquer paradigm often helps in the discovery of efficient algorithms. It was the key, for example, to Karatsuba's fast multiplication method, the quicksort and mergesort algorithms, the Strassen algorithm for matrix multiplication, and fast Fourier transforms.

In all these examples, the D&C approach led to an improvement in the asymptotic cost of the solution. For example, if (a) the base cases have constant-bounded size, the work of splitting the problem and combining the partial solutions is proportional to the problem's size , and (b) there is a bounded number of sub-problems of size ~ at each stage, then the cost of the divide-and-conquer algorithm will be .

Parallelism

Divide-and-conquer algorithms are naturally adapted for execution in multi-processor machines, especially shared-memory systems where the communication of data between processors does not need to be planned in advance because distinct sub-problems can be executed on different processors.

Memory access

Divide-and-conquer algorithms naturally tend to make efficient use of memory caches. The reason is that once a sub-problem is small enough, it and all its sub-problems can, in principle, be solved within the cache, without accessing the slower main memory. An algorithm designed to exploit the cache in this way is called cache-oblivious , because it does not contain the cache size as an explicit parameter. [9] Moreover, D&C algorithms can be designed for important algorithms (e.g., sorting, FFTs, and matrix multiplication) to be optimal cache-oblivious algorithms–they use the cache in a probably optimal way, in an asymptotic sense, regardless of the cache size. In contrast, the traditional approach to exploiting the cache is blocking, as in loop nest optimization, where the problem is explicitly divided into chunks of the appropriate size—this can also use the cache optimally, but only when the algorithm is tuned for the specific cache sizes of a particular machine.

The same advantage exists with regards to other hierarchical storage systems, such as NUMA or virtual memory, as well as for multiple levels of cache: once a sub-problem is small enough, it can be solved within a given level of the hierarchy, without accessing the higher (slower) levels.

Roundoff control

In computations with rounded arithmetic, e.g. with floating-point numbers, a divide-and-conquer algorithm may yield more accurate results than a superficially equivalent iterative method. For example, one can add N numbers either by a simple loop that adds each datum to a single variable, or by a D&C algorithm called pairwise summation that breaks the data set into two halves, recursively computes the sum of each half, and then adds the two sums. While the second method performs the same number of additions as the first and pays the overhead of the recursive calls, it is usually more accurate. [10]

Implementation issues

Recursion

Divide-and-conquer algorithms are naturally implemented as recursive procedures. In that case, the partial sub-problems leading to the one currently being solved are automatically stored in the procedure call stack. A recursive function is a function that calls itself within its definition.

Explicit stack

Divide-and-conquer algorithms can also be implemented by a non-recursive program that stores the partial sub-problems in some explicit data structure, such as a stack, queue, or priority queue. This approach allows more freedom in the choice of the sub-problem that is to be solved next, a feature that is important in some applications — e.g. in breadth-first recursion and the branch-and-bound method for function optimization. This approach is also the standard solution in programming languages that do not provide support for recursive procedures.

Stack size

In recursive implementations of D&C algorithms, one must make sure that there is sufficient memory allocated for the recursion stack, otherwise, the execution may fail because of stack overflow. D&C algorithms that are time-efficient often have relatively small recursion depth. For example, the quicksort algorithm can be implemented so that it never requires more than nested recursive calls to sort items.

Stack overflow may be difficult to avoid when using recursive procedures since many compilers assume that the recursion stack is a contiguous area of memory, and some allocate a fixed amount of space for it. Compilers may also save more information in the recursion stack than is strictly necessary, such as return address, unchanging parameters, and the internal variables of the procedure. Thus, the risk of stack overflow can be reduced by minimizing the parameters and internal variables of the recursive procedure or by using an explicit stack structure.

Choosing the base cases

In any recursive algorithm, there is considerable freedom in the choice of the base cases, the small subproblems that are solved directly in order to terminate the recursion.

Choosing the smallest or simplest possible base cases is more elegant and usually leads to simpler programs, because there are fewer cases to consider and they are easier to solve. For example, an FFT algorithm could stop the recursion when the input is a single sample, and the quicksort list-sorting algorithm could stop when the input is the empty list; in both examples, there is only one base case to consider, and it requires no processing.

On the other hand, efficiency often improves if the recursion is stopped at relatively large base cases, and these are solved non-recursively, resulting in a hybrid algorithm. This strategy avoids the overhead of recursive calls that do little or no work and may also allow the use of specialized non-recursive algorithms that, for those base cases, are more efficient than explicit recursion. A general procedure for a simple hybrid recursive algorithm is short-circuiting the base case, also known as arm's-length recursion . In this case, whether the next step will result in the base case is checked before the function call, avoiding an unnecessary function call. For example, in a tree, rather than recursing to a child node and then checking whether it is null, checking null before recursing; avoids half the function calls in some algorithms on binary trees. Since a D&C algorithm eventually reduces each problem or sub-problem instance to a large number of base instances, these often dominate the overall cost of the algorithm, especially when the splitting/joining overhead is low. Note that these considerations do not depend on whether recursion is implemented by the compiler or by an explicit stack.

Thus, for example, many library implementations of quicksort will switch to a simple loop-based insertion sort (or similar) algorithm once the number of items to be sorted is sufficiently small. Note that, if the empty list were the only base case, sorting a list with entries would entail maximally quicksort calls that would do nothing but return immediately. Increasing the base cases to lists of size 2 or less will eliminate most of those do-nothing calls, and more generally a base case larger than 2 is typically used to reduce the fraction of time spent in function-call overhead or stack manipulation.

Alternatively, one can employ large base cases that still use a divide-and-conquer algorithm, but implement the algorithm for predetermined set of fixed sizes where the algorithm can be completely unrolled into code that has no recursion, loops, or conditionals (related to the technique of partial evaluation). For example, this approach is used in some efficient FFT implementations, where the base cases are unrolled implementations of divide-and-conquer FFT algorithms for a set of fixed sizes. [11] Source-code generation methods may be used to produce the large number of separate base cases desirable to implement this strategy efficiently. [11]

The generalized version of this idea is known as recursion "unrolling" or "coarsening", and various techniques have been proposed for automating the procedure of enlarging the base case. [12]

Dynamic programming for overlapping subproblems

For some problems, the branched recursion may end up evaluating the same sub-problem many times over. In such cases it may be worth identifying and saving the solutions to these overlapping subproblems, a technique which is commonly known as memoization. Followed to the limit, it leads to bottom-up divide-and-conquer algorithms such as dynamic programming.

See also

Related Research Articles

<span class="mw-page-title-main">Algorithm</span> Sequence of operations for a task

In mathematics and computer science, an algorithm is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes and deduce valid inferences, achieving automation eventually. Using human characteristics as descriptors of machines in metaphorical ways was already practiced by Alan Turing with terms such as "memory", "search" and "stimulus".

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

<span class="mw-page-title-main">Recursion</span> Process of repeating items in a self-similar way

Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics and computer science, where a function being defined is applied within its own definition. While this apparently defines an infinite number of instances, it is often done in such a way that no infinite loop or infinite chain of references can occur.

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size in terms of N1 smaller DFTs of sizes N2, recursively, to reduce the computation time to O(N log N) for highly composite N (smooth numbers). Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

In the analysis of algorithms, the master theorem for divide-and-conquer recurrences provides an asymptotic analysis for many recurrence relations that occur in the analysis of divide-and-conquer algorithms. The approach was first presented by Jon Bentley, Dorothea Blostein, and James B. Saxe in 1980, where it was described as a "unifying method" for solving such recurrences. The name "master theorem" was popularized by the widely-used algorithms textbook Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.

In computing, a cache-oblivious algorithm is an algorithm designed to take advantage of a processor cache without having the size of the cache as an explicit parameter. An optimal cache-oblivious algorithm is a cache-oblivious algorithm that uses the cache optimally. Thus, a cache-oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes. Cache-oblivious algorithms are contrasted with explicit loop tiling, which explicitly breaks a problem into blocks that are optimally sized for a given cache.

<span class="mw-page-title-main">Quickselect</span> Algorithm for the kth smallest element in an array

In computer science, quickselect is a selection algorithm to find the kth smallest element in an unordered list, also known as the kth order statistic. Like the related quicksort sorting algorithm, it was developed by Tony Hoare, and thus is also known as Hoare's selection algorithm. Like quicksort, it is efficient in practice and has good average-case performance, but has poor worst-case performance. Quickselect and its variants are the selection algorithms most often used in efficient real-world implementations.

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

<span class="mw-page-title-main">Karatsuba algorithm</span> Algorithm for integer multiplication

The Karatsuba algorithm is a fast multiplication algorithm. It was discovered by Anatoly Karatsuba in 1960 and published in 1962. It is a divide-and-conquer algorithm that reduces the multiplication of two n-digit numbers to three multiplications of n/2-digit numbers and, by repeating this reduction, to at most single-digit multiplications. It is therefore asymptotically faster than the traditional algorithm, which performs single-digit products.

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, and HTML documentation .

The Kirkpatrick–Seidel algorithm, proposed by its authors as a potential "ultimate planar convex hull algorithm", is an algorithm for computing the convex hull of a set of points in the plane, with time complexity, where is the number of input points and is the number of points in the hull. Thus, the algorithm is output-sensitive: its running time depends on both the input size and the output size. Another output-sensitive algorithm, the gift wrapping algorithm, was known much earlier, but the Kirkpatrick–Seidel algorithm has an asymptotic running time that is significantly smaller and that always improves on the bounds of non-output-sensitive algorithms. The Kirkpatrick–Seidel algorithm is named after its inventors, David G. Kirkpatrick and Raimund Seidel.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence. Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good while having much lower computational cost—it can be implemented so as to have nearly the same cost as naive summation.

A hybrid algorithm is an algorithm that combines two or more other algorithms that solve the same problem, either choosing one based on some characteristic of the data, or switching between them over the course of the algorithm. This is generally done to combine desired features of each, so that the overall algorithm is better than the individual components.

In the design and analysis of algorithms for combinatorial optimization, parametric search is a technique invented by Nimrod Megiddo for transforming a decision algorithm into an optimization algorithm. It is frequently used for solving optimization problems in computational geometry.

References

  1. Blahut, Richard (14 May 2014). Fast Algorithms for Signal Processing. Cambridge University Press. pp. 139–143. ISBN   978-0-511-77637-3.
  2. Thomas H. Cormen; Charles E. Leiserson; Ronald L. Rivest; Clifford Stein (31 July 2009). Introduction to Algorithms. MIT Press. ISBN   978-0-262-53305-8.
  3. Brassard, G., and Bratley, P. Fundamental of Algorithmics, Prentice-Hall, 1996.
  4. Anany V. Levitin, Introduction to the Design and Analysis of Algorithms (Addison Wesley, 2002).
  5. 1 2 3 Donald E. Knuth, The Art of Computer Programming: Volume 3, Sorting and Searching, second edition (Addison-Wesley, 1998).
  6. Heideman, M. T., D. H. Johnson, and C. S. Burrus, "Gauss and the history of the fast Fourier transform", IEEE ASSP Magazine, 1, (4), 14–21 (1984).
  7. Knuth, Donald (1998). The Art of Computer Programming: Volume 3 Sorting and Searching . p.  159. ISBN   0-201-89685-0.
  8. Karatsuba, Anatolii A.; Yuri P. Ofman (1962). "Умножение многозначных чисел на автоматах". Doklady Akademii Nauk SSSR . 146: 293–294. Translated in Karatsuba, A.; Ofman, Yu. (1963). "Multiplication of Multidigit Numbers on Automata". Soviet Physics Doklady. 7: 595–596. Bibcode:1963SPhD....7..595K.
  9. M. Frigo; C. E. Leiserson; H. Prokop (1999). "Cache-oblivious algorithms". 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039). pp. 285–297. doi:10.1109/SFFCS.1999.814600. ISBN   0-7695-0409-4. S2CID   62758836.
  10. Nicholas J. Higham, "The accuracy of floating-point summation", SIAM J. Scientific Computing14 (4), 783–799 (1993).
  11. 1 2 Frigo, M.; Johnson, S. G. (February 2005). "The design and implementation of FFTW3" (PDF). Proceedings of the IEEE. 93 (2): 216–231. CiteSeerX   10.1.1.66.3097 . doi:10.1109/JPROC.2004.840301. S2CID   6644892.
  12. Radu Rugina and Martin Rinard, "Recursion unrolling for divide and conquer programs" in Languages and Compilers for Parallel Computing, chapter 3, pp. 34–48. Lecture Notes in Computer Science vol. 2017 (Berlin: Springer, 2001).