# Randomized algorithm

Last updated

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits. Formally, the algorithm's performance will be a random variable determined by the random bits; thus either the running time, or the output (or both) are random variables.

In mathematics and computer science, an algorithm is a sequence of instructions, typically to solve a class of problems or perform a computation. Algorithms are unambiguous specifications for performing calculation, data processing, automated reasoning, and other tasks.

Randomness is the lack of pattern or predictability in events. A random sequence of events, symbols or steps has no order and does not follow an intelligible pattern or combination. Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over numerous events is predictable. For example, when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will occur twice as often as 4. In this view, randomness is a measure of uncertainty of an outcome, rather than haphazardness, and applies to concepts of chance, probability, and information entropy.

In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is described informally as a variable whose values depend on outcomes of a random phenomenon. The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a measurable function defined on a probability space whose outcomes are typically real numbers.

## Contents

One has to distinguish between algorithms that use the random input so that they always terminate with the correct answer, but where the expected running time is finite (Las Vegas algorithms, for example Quicksort [1] ), and algorithms which have a chance of producing an incorrect result (Monte Carlo algorithms, for example the Monte Carlo algorithm for the MFAS problem [2] ) or fail to produce a result either by signaling a failure or failing to terminate. In some cases, probabilistic algorithms are the only practical means of solving a problem. [3]

In computing, a Las Vegas algorithm is a randomized algorithm that always gives correct results; that is, it always produces the correct result or it informs about the failure. However, the runtime of a Las Vegas algorithm differs depending on the input. The usual definition of a Las Vegas algorithm includes the restriction that the expected runtime be finite, where the expectation is carried out over the space of random information, or entropy, used in the algorithm. An alternative definition requires that a Las Vegas algorithm always terminates, but may output a symbol not part of the solution space to indicate failure in finding a solution. The nature of Las Vegas algorithms make them suitable in situations where the number of possible solutions is limited, and where verifying the correctness of a candidate solution is relatively easy while finding a solution is complex.

Quicksort is an efficient sorting algorithm, serving as a systematic method for placing the elements of a random access file or an array in order. Developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. When implemented well, it can be about two or three times faster than its main competitors, merge sort and heapsort.

In computing, a Monte Carlo algorithm is a randomized algorithm whose output may be incorrect with a certain probability. Two examples of such algorithms are Karger–Stein algorithm and Monte Carlo algorithm for minimum Feedback arc set.

In common practice, randomized algorithms are approximated using a pseudorandom number generator in place of a true source of random bits; such an implementation may deviate from the expected theoretical behavior.

A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.

## Motivation

As a motivating example, consider the problem of finding an ‘a’ in an array of n elements.

In computer science, an array data structure, or simply an array, is a data structure consisting of a collection of elements, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.

Input: An array of n≥2 elements, in which half are ‘a’s and the other half are ‘b’s.

Output: Find an ‘a’ in the array.

We give two versions of the algorithm, one Las Vegas algorithm and one Monte Carlo algorithm.

Las Vegas algorithm:

findingA_LV(arrayA,n)beginrepeatRandomlyselectoneelementoutofnelements.until'a'isfoundend

This algorithm succeeds with probability 1. The number of iterations varies and can be arbitrarily large, but the expected number of iterations is

${\displaystyle \lim _{n\to \infty }\sum _{i\mathop {=} 1}^{n}{\frac {i}{2^{i}}}=2}$

Since it is constant the expected run time over many calls is ${\displaystyle \Theta (1)}$. (See Big O notation)

Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. It is a member of a family of notations invented by Paul Bachmann, Edmund Landau, and others, collectively called Bachmann–Landau notation or asymptotic notation.

Monte Carlo algorithm:

findingA_MC(arrayA,n,k)begini=0repeatRandomlyselectoneelementoutofnelements.i=i+1untili=kor'a'isfoundend

If an ‘a’ is found, the algorithm succeeds, else the algorithm fails. After k iterations, the probability of finding an ‘a’ is:

${\displaystyle \Pr[\mathrm {find~a} ]=1-(1/2)^{k}}$

This algorithm does not guarantee success, but the run time is bounded. The number of iterations is always less than or equal to k. Taking k to be constant the run time (expected and absolute) is ${\displaystyle \Theta (1)}$.

Randomized algorithms are particularly useful when faced with a malicious "adversary" or attacker who deliberately tries to feed a bad input to the algorithm (see worst-case complexity and competitive analysis (online algorithm)) such as in the Prisoner's dilemma. It is for this reason that randomness is ubiquitous in cryptography. In cryptographic applications, pseudo-random numbers cannot be used, since the adversary can predict them, making the algorithm effectively deterministic. Therefore, either a source of truly random numbers or a cryptographically secure pseudo-random number generator is required. Another area in which randomness is inherent is quantum computing.

In the example above, the Las Vegas algorithm always outputs the correct answer, but its running time is a random variable. The Monte Carlo algorithm (related to the Monte Carlo method for simulation) is guaranteed to complete in an amount of time that can be bounded by a function the input size and its parameter k, but allows a small probability of error. Observe that any Las Vegas algorithm can be converted into a Monte Carlo algorithm (via Markov's inequality), by having it output an arbitrary, possibly incorrect answer if it fails to complete within a specified time. Conversely, if an efficient verification procedure exists to check whether an answer is correct, then a Monte Carlo algorithm can be converted into a Las Vegas algorithm by running the Monte Carlo algorithm repeatedly till a correct answer is obtained.

## Computational complexity

Computational complexity theory models randomized algorithms as probabilistic Turing machines . Both Las Vegas and Monte Carlo algorithms are considered, and several complexity classes are studied. The most basic randomized complexity class is RP, which is the class of decision problems for which there is an efficient (polynomial time) randomized algorithm (or probabilistic Turing machine) which recognizes NO-instances with absolute certainty and recognizes YES-instances with a probability of at least 1/2. The complement class for RP is co-RP. Problem classes having (possibly nonterminating) algorithms with polynomial time average case running time whose output is always correct are said to be in ZPP.

The class of problems for which both YES and NO-instances are allowed to be identified with some error is called BPP. This class acts as the randomized equivalent of P, i.e. BPP represents the class of efficient randomized algorithms.

## History

Historically, the first randomized algorithm was a method developed by Michael O. Rabin for the closest pair problem in computational geometry. [4] The study of randomized algorithms was spurred by the 1977 discovery of a randomized primality test (i.e., determining the primality of a number) by Robert M. Solovay and Volker Strassen. Soon afterwards Michael O. Rabin demonstrated that the 1976 Miller's primality test can be turned into a randomized algorithm. At that time, no practical deterministic algorithm for primality was known.

The Miller-Rabin primality test relies on a binary relation between two positive integers k and n that can be expressed by saying that k "is a witness to the compositeness of" n. It can be shown that

• If there is a witness to the compositeness of n, then n is composite (i.e., n is not prime), and
• If n is composite then at least three-fourths of the natural numbers less than n are witnesses to its compositeness, and
• There is a fast algorithm that, given k and n, ascertains whether k is a witness to the compositeness of n.

Observe that this implies that the primality problem is in Co-RP.

If one randomly chooses 100 numbers less than a composite number n, then the probability of failing to find such a "witness" is (1/4)100 so that for most practical purposes, this is a good primality test. If n is big, there may be no other test that is practical. The probability of error can be reduced to an arbitrary degree by performing enough independent tests.

Therefore, in practice, there is no penalty associated with accepting a small probability of error, since with a little care the probability of error can be made astronomically small. Indeed, even though a deterministic polynomial-time primality test has since been found (see AKS primality test), it has not replaced the older probabilistic tests in cryptographic software nor is it expected to do so for the foreseeable future.

## Examples

### Quicksort

Quicksort is a familiar, commonly used algorithm in which randomness can be useful. Any deterministic version of this algorithm requires O (n2) time to sort n numbers for some well-defined class of degenerate inputs (such as an already sorted array), with the specific class of inputs that generate this behavior defined by the protocol for pivot selection. However, if the algorithm selects pivot elements uniformly at random, it has a provably high probability of finishing in O(n log n) time regardless of the characteristics of the input.

### Randomized incremental constructions in geometry

In computational geometry, a standard technique to build a structure like a convex hull or Delaunay triangulation is to randomly permute the input points and then insert them one by one into the existing structure. The randomization ensures that the expected number of changes to the structure caused by an insertion is small, and so the expected running time of the algorithm can be upper bounded. This technique is known as randomized incremental construction. [5]

### Min cut

Input: A graph G(V,E)

Output: A cut partitioning the vertices into L and R, with the minimum number of edges between L and R.

Recall that the contraction of two nodes, u and v, in a (multi-)graph yields a new node u ' with edges that are the union of the edges incident on either u or v, except from any edge(s) connecting u and v. Figure 1 gives an example of contraction of vertex A and B. After contraction, the resulting graph may have parallel edges, but contains no self loops.

Karger's [6] basic algorithm:

begin    i=1    repeatrepeat         Take a random edge (u,v)∈ E in G         replace u and v with the contraction u'      until only 2 nodes remain      obtain the corresponding cut result Ci      i=i+1    until i=m    output the minimum cut among C1,C2,...,Cm.  end

In each execution of the outer loop, the algorithm repeats the inner loop until only 2 nodes remain, the corresponding cut is obtained. The run time of one execution is ${\displaystyle O(n)}$, and n denotes the number of vertices. After m times executions of the outer loop, we output the minimum cut among all the results. The figure 2 gives an example of one execution of the algorithm. After execution, we get a cut of size 3.

Lemma 1: Let k be the min cut size, and let C = {e1,e2,...,ek} be the min cut. If, during iteration i, no edge eC is selected for contraction, then Ci = C.

Proof: If G is not connected, then G can be partitioned into L and R without any edge between them. So the min cut in a disconnected graph is 0. Now, assume G is connected. Let V=LR be the partition of V induced by C : C={ {u,v} ∈ E : uL,vR } (well-defined since G is connected). Consider an edge {u,v} of C. Initially, u,v are distinct vertices. As long as we pick an edge ${\displaystyle f\neq e}$, u and v do not get merged. Thus, at the end of the algorithm, we have two compound nodes covering the entire graph, one consisting of the vertices of L and the other consisting of the vertices of R. As in figure 2, the size of min cut is 1, and C = {(A,B)}. If we don't select (A,B) for contraction, we can get the min cut.

Lemma 2: If G is a multigraph with p vertices and whose min cut has size k, then G has at least pk/2 edges.

Proof: Because the min cut is k, every vertex v must satisfy degree(v) ≥ k. Therefore, the sum of the degree is at least pk. But it is well known that the sum of vertex degrees equals 2|E|. The lemma follows.

Analysis of algorithm

The probability that the algorithm succeeds is 1  the probability that all attempts fail. By independence, the probability that all attempts fail is

${\displaystyle \prod _{i=1}^{m}\Pr(C_{i}\neq C)=\prod _{i=1}^{m}(1-\Pr(C_{i}=C)).}$

By lemma 1, the probability that Ci = C is the probability that no edge of C is selected during iteration i. Consider the inner loop and let Gj denote the graph after j edge contractions, where j  {0,1,...,n  3}. Gj has n  j vertices. We use the chain rule of conditional possibilities. The probability that the edge chosen at iteration j is not in C, given that no edge of C has been chosen before, is ${\displaystyle 1-{\frac {k}{|E(G_{j})|}}}$. Note that Gj still has min cut of size k, so by Lemma 2, it still has at least ${\displaystyle {\frac {(n-j)k}{2}}}$ edges.

Thus, ${\displaystyle 1-{\frac {k}{|E(G_{j})|}}\geq 1-{\frac {2}{n-j}}={\frac {n-j-2}{n-j}}}$.

So by the chain rule, the probability of finding the min cut C is ${\displaystyle Pr[C_{i}=C]\geq \left({\frac {n-2}{n}}\right)\left({\frac {n-3}{n-1}}\right)\left({\frac {n-4}{n-2}}\right)\ldots \left({\frac {3}{5}}\right)\left({\frac {2}{4}}\right)\left({\frac {1}{3}}\right).}$

Cancellation gives ${\displaystyle \Pr[C_{i}=C]\geq {\frac {2}{n(n-1)}}}$. Thus the probability that the algorithm succeeds is at least ${\displaystyle 1-\left(1-{\frac {2}{n(n-1)}}\right)^{m}}$. For ${\displaystyle m={\frac {n(n-1)}{2}}\ln n}$, this is equivalent to ${\displaystyle 1-{\frac {1}{n}}}$. The algorithm finds the min cut with probability ${\displaystyle 1-{\frac {1}{n}}}$, in time ${\displaystyle O(mn)=O(n^{3}\log n)}$.

## Derandomization

Randomness can be viewed as a resource, like space and time. Derandomization is then the process of removing randomness (or using as little of it as possible). It is not currently known if all algorithms can be derandomized without significantly increasing their running time. For instance, in computational complexity, it is unknown whether P = BPP, i.e., we do not know whether we can take an arbitrary randomized algorithm that run in polynomial time with a small error probability and derandomize it to run in polynomial time without using randomness.

There are specific methods that can be employed to derandomize particular randomized algorithms:

## Where randomness helps

When the model of computation is restricted to Turing machines, it is currently an open question whether the ability to make random choices allows some problems to be solved in polynomial time that cannot be solved in polynomial time without this ability; this is the question of whether P = BPP. However, in other contexts, there are specific examples of problems where randomization yields strict improvements.

• Based on the initial motivating example: given an exponentially long string of 2k characters, half a's and half b's, a random access machine requires at least 2k1 lookups in the worst-case to find the index of an a; if it is permitted to make random choices, it can solve this problem in an expected polynomial number of lookups.
• The natural way of carrying out a numerical computation in embedded systems or cyber-physical systems is to provide a result that approximates the correct one with high probability (or Probably Approximately Correct Computation (PACC)). The hard problem associated with the evaluation of the discrepancy loss between the approximated and the correct computation can be effectively addressed by resorting to randomization [7]
• In communication complexity, the equality of two strings can be verified to some reliability using ${\displaystyle \log n}$ bits of communication with a randomized protocol. Any deterministic protocol requires ${\displaystyle \Theta (n)}$ bits if defending against a strong opponent. [8]
• The volume of a convex body can be estimated by a randomized algorithm to arbitrary precision in polynomial time. [9] Bárány and Füredi showed that no deterministic algorithm can do the same. [10] This is true unconditionally, i.e. without relying on any complexity-theoretic assumptions, assuming the convex body can be queried only as a black box.
• A more complexity-theoretic example of a place where randomness appears to help is the class IP. IP consists of all languages that can be accepted (with high probability) by a polynomially long interaction between an all-powerful prover and a verifier that implements a BPP algorithm. IP = PSPACE. [11] However, if it is required that the verifier be deterministic, then IP = NP.
• In a chemical reaction network (a finite set of reactions like A+B → 2C + D operating on a finite number of molecules), the ability to ever reach a given target state from an initial state is decidable, while even approximating the probability of ever reaching a given target state (using the standard concentration-based probability for which reaction will occur next) is undecidable. More specifically, a limited Turing machine can be simulated with arbitrarily high probability of running correctly for all time, only if a random chemical reaction network is used. With a simple nondeterministic chemical reaction network (any possible reaction can happen next), the computational power is limited to primitive recursive functions. [12]

## Notes

1. Hoare, C. A. R. (July 1961). "Algorithm 64: Quicksort". Commun. ACM. 4 (7): 321–. doi:10.1145/366622.366644. ISSN   0001-0782.
2. Kudelić, Robert (2016-04-01). "Monte-Carlo randomized algorithm for minimal feedback arc set problem". Applied Soft Computing. 41: 235–246. doi:10.1016/j.asoc.2015.12.018.
3. "In testing primality of very large numbers chosen at random, the chance of stumbling upon a value that fools the Fermat test is less than the chance that cosmic radiation will cause the computer to make an error in carrying out a 'correct' algorithm. Considering an algorithm to be inadequate for the first reason but not for the second illustrates the difference between mathematics and engineering." Hal Abelson and Gerald J. Sussman (1996). Structure and Interpretation of Computer Programs . MIT Press, section 1.2.
4. Smid, Michiel. Closest point problems in computational geometry. Max-Planck-Institut für Informatik|year=1995
5. A. A. Tsay, W. S. Lovejoy, David R. Karger, Random Sampling in Cut, Flow, and Network Design Problems , Mathematics of Operations Research, 24(2):383–413, 1999.
6. Alippi, Cesare (2014), Intelligence for Embedded Systems, Springer, ISBN   978-3-319-05278-6 .
7. Kushilevitz, Eyal; Nisan, Noam (2006), Communication Complexity, Cambridge University Press, ISBN   9780521029834 . For the deterministic lower bound see p. 11; for the logarithmic randomized upper bound see pp. 31–32.
8. Dyer, M.; Frieze, A.; Kannan, R. (1991), "A random polynomial-time algorithm for approximating the volume of convex bodies" (PDF), Journal of the ACM , 38 (1): 1–17, doi:10.1145/102782.102783
9. Füredi, Z.; Bárány, I. (1986), "Computing the volume is difficult", Proc. 18th ACM Symposium on Theory of Computing (Berkeley, California, May 28–30, 1986)(PDF), New York, NY: ACM, pp. 442–447, doi:10.1145/12130.12176, ISBN   0-89791-193-8 https://ecommons.cornell.edu/bitstream/1813/8572/1/TR000688.pdf |url= missing title (help)
10. Shamir, A. (1992), "IP = PSPACE", Journal of the ACM, 39 (4): 869–877, doi:10.1145/146585.146609
11. Cook, Matthew; Soloveichik, David; Winfree, Erik; Bruck, Jehoshua (2009), "Programmability of chemical reaction networks", in Condon, Anne; Harel, David; Kok, Joost N.; Salomaa, Arto; Winfree, Erik (eds.), Algorithmic Bioprocesses (PDF), Natural Computing Series, Springer-Verlag, pp. 543–584, doi:10.1007/978-3-540-88869-7_27 .

## Related Research Articles

In computational complexity theory, bounded-error probabilistic polynomial time (BPP) is the class of decision problems solvable by a probabilistic Turing machine in polynomial time with an error probability bounded away from 1/3 for all instances. BPP is one of the largest practical classes of problems, meaning most problems of interest in BPP have efficient probabilistic algorithms that can be run quickly on real modern machines. BPP also contains P, the class of problems solvable in polynomial time with a deterministic machine, since a deterministic machine is a special case of a probabilistic machine.

Computational complexity theory focuses on classifying computational problems according to their inherent difficulty, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

In computational complexity theory, NP is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time.

In computational complexity theory, randomized polynomial time (RP) is the complexity class of problems for which a probabilistic Turing machine exists with these properties:

A primality test is an algorithm for determining whether an input number is prime. Among other fields of mathematics, it is used for cryptography. Unlike integer factorization, primality tests do not generally give prime factors, only stating whether the input number is prime or not. Factorization is thought to be a computationally difficult problem, whereas primality testing is comparatively easy. Some primality tests prove that a number is prime, while others like Miller–Rabin prove that a number is composite. Therefore, the latter might more accurately be called compositeness tests instead of primality tests.

In computability theory, a probabilistic Turing machine is a non-deterministic Turing machine which chooses between the available transitions at each point according to some probability distribution.

In computer science, the time complexity is the computational complexity that describes the amount of time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to differ by at most a constant factor.

In complexity theory, PP is the class of decision problems solvable by a probabilistic Turing machine in polynomial time, with an error probability of less than 1/2 for all instances. The abbreviation PP refers to probabilistic polynomial time. The complexity class was defined by Gill in 1977.

In computational complexity theory, an Arthur–Merlin protocol is an interactive proof system in which the verifier's coin tosses are constrained to be public. This notion was introduced by Babai (1985). Goldwasser & Sipser (1986) proved that all (formal) languages with interactive proofs of arbitrary length with private coins also have interactive proofs with public coins.

In computational complexity theory, the complexity class NEXPTIME is the set of decision problems that can be solved by a non-deterministic Turing machine using time .

The Solovay–Strassen primality test, developed by Robert M. Solovay and Volker Strassen, is a probabilistic test to determine if a number is composite or probably prime. It has been largely superseded by the Baillie-PSW primality test and the Miller–Rabin primality test, but has great historical importance in showing the practical feasibility of the RSA cryptosystem. The Solovay–Strassen test is essentially an Euler–Jacobi pseudoprime test.

In computational complexity theory, NL is the complexity class containing decision problems which can be solved by a nondeterministic Turing machine using a logarithmic amount of memory space.

Randomized Logarithmic-space (RL), sometimes called RLP, is the complexity class of computational complexity theory problems solvable in logarithmic space and polynomial time with probabilistic Turing machines with one-sided error. It is named in analogy with RP, which is similar but has no logarithmic space restriction.

In theoretical computer science and cryptography, a pseudorandom generator (PRG) for a class of statistical tests is a deterministic procedure that maps a random seed to a longer pseudorandom string such that no statistical test in the class can distinguish between the output of the generator and the uniform distribution. The random seed is typically a short binary string drawn from the uniform distribution.

In computational complexity theory, P/poly is the complexity class of languages recognized by a polynomial-time Turing machine with a polynomial-bounded advice function. It is also equivalently defined as the class PSIZE of languages that have polynomial-size circuits. This means that the machine that recognizes a language may use a different advice function or use a different circuit depending on the length of the input, and that the advice function or circuit will vary only on the size of the input.

In mathematics, the Schwartz–Zippel lemma is a tool commonly used in probabilistic polynomial identity testing, i.e. in the problem of determining whether a given multivariate polynomial is the 0-polynomial. It was discovered independently by Jack Schwartz, Richard Zippel, and Richard DeMillo and Richard J. Lipton, although DeMillo and Lipton's version was shown a year prior to Schwartz and Zippel's result. The finite field version of this bound was proved by Øystein Ore in 1922.

In computational complexity theory, BPL, sometimes called BPLP, is the complexity class of problems solvable in logarithmic space and polynomial time with probabilistic Turing machines with two-sided error. It is named in analogy with BPP, which is similar but has no logarithmic space restriction.

For a graph, a maximum cut is a cut whose size is at least the size of any other cut. The problem of finding a maximum cut in a graph is known as the Max-Cut Problem.

In mathematics, polynomial identity testing (PIT) is the problem of efficiently determining whether two multivariate polynomials are identical. More formally, a PIT algorithm is given an arithmetic circuit that computes a polynomial p in a field, and decides whether p is the zero polynomial. Determining the computational complexity required for polynomial identity testing is one of the most important open problems in algebraic computing complexity.