Las Vegas algorithm

Last updated

In computing, a Las Vegas algorithm is a randomized algorithm that always gives correct results; that is, it always produces the correct result or it informs about the failure. However, the runtime of a Las Vegas algorithm differs depending on the input. The usual definition of a Las Vegas algorithm includes the restriction that the expected runtime be finite, where the expectation is carried out over the space of random information, or entropy, used in the algorithm. An alternative definition requires that a Las Vegas algorithm always terminates (is effective), but may output a symbol not part of the solution space to indicate failure in finding a solution. [1] The nature of Las Vegas algorithms makes them suitable in situations where the number of possible solutions is limited, and where verifying the correctness of a candidate solution is relatively easy while finding a solution is complex.

Contents

Systematic search methods for computationally hard problems, such as some variants of the Davis–Putnam algorithm for propositional satisfiability (SAT), also utilize non-deterministic decisions, and can thus also be considered Las Vegas algorithms. [2]

History

Las Vegas algorithms were introduced by László Babai in 1979, in the context of the graph isomorphism problem, as a dual to Monte Carlo algorithms. [3] Babai [4] introduced the term "Las Vegas algorithm" alongside an example involving coin flips: the algorithm depends on a series of independent coin flips, and there is a small chance of failure (no result). However, in contrast to Monte Carlo algorithms, the Las Vegas algorithm can guarantee the correctness of any reported result.

Example

// Las Vegas algorithmrepeat:k=RandInt(n)ifA[k]==1,returnk;

As mentioned above, Las Vegas algorithms always return correct results. The code above illustrates this property. A variable k is generated randomly; after k is generated, k is used to index the array A. If this index contains the value 1, then k is returned; otherwise, the algorithm repeats this process until it finds 1. Although this Las Vegas algorithm is guaranteed to find the correct answer, it does not have a fixed runtime; due to the randomization (in line 3 of the above code), it is possible for arbitrarily much time to elapse before the algorithm terminates.

Definition

This section provides the conditions that characterize an algorithm's being of Las Vegas type.

An algorithm A is a Las Vegas algorithm for problem class X, if [5]

  1. whenever for a given problem instance x∈X it returns a solution s, s is guaranteed to be a valid solution of x
  2. on each given instance x, the run-time of A is a random variable RTA,x

There are three notions of completeness for Las Vegas algorithms:

Let P(RTA,x ≤ t) denote the probability that A finds a solution for a soluble instance x in time within t, then A is complete exactly if for each x there exists

some tmax such that P(RTA,x ≤ tmax) = 1.

Approximate completeness is primarily of theoretical interest, as the time limits for finding solutions are usually too large to be of practical use.

Application scenarios

Las Vegas algorithms have different criteria for the evaluation based on the problem setting. These criteria are divided into three categories with different time limits since Las Vegas algorithms do not have set time complexity. Here are some possible application scenarios:

(Type 1 and Type 2 are special cases of Type 3.)

For Type 1 where there is no time limit, the average run-time can represent the run-time behavior. This is not the same case for Type 2.

Here, P(RTtmax), which is the probability of finding a solution within time, describes its run-time behavior.

In case of Type 3, its run-time behavior can only be represented by the run-time distribution function rtd: R → [0,1] defined as rtd(t) = P(RTt) or its approximation.

The run-time distribution (RTD) is the distinctive way to describe the run-time behavior of a Las Vegas algorithm.

With this data, we can easily get other criteria such as the mean run-time, standard deviation, median, percentiles, or success probabilities P(RTt) for arbitrary time-limits t.

Applications

Analogy

Las Vegas algorithms arise frequently in search problems. For example, one looking for some information online might search related websites for the desired information. The time complexity thus ranges from getting "lucky" and finding the content immediately, to being "unlucky" and spending large amounts of time. Once the right website is found, then there is no possibility of error. [6]

Randomized quicksort

INPUT:# A is an array of n elementsdefrandomized_quicksort(A):ifn==1:returnA# A is sorted.else:i=random.randrange(1,n)# Will take a random number in the range 1~nX=A[i]# The pivot element"""Partition A into elements < x, x, and >x  # as shown in the figure above.    Execute Quicksort on A[1 to i-1] and A[i+1 to n].    Combine the responses in order to obtain a sorted array."""

A simple example is randomized quicksort, where the pivot is chosen randomly, and divides the elements into three partitions: elements less than pivot, elements equal to pivot, and elements greater than pivot. The randomized QuickSort require a lot of resources but always generate the sorted array as an output.[ citation needed ]

It is obvious that QuickSort always generates the solution, which in this case the sorted array. Unfortunately, the time complexity is not that obvious. It turns out that the runtime depends on which element we pick as a pivot.


The runtime of quicksort depends heavily on how well the pivot is selected. If a value of pivot is either too big or small, the partition will be unbalanced, resulting in a poor runtime efficiency. However, if the value of pivot is near the middle of the array, then the split will be reasonably well balanced, yielding a faster runtime. Since the pivot is randomly picked, the running time will be good most of the time and bad occasionally.

In the case of average case, it is hard to determine since the analysis does not depend on the input distribution but on the random choices that the algorithm makes. The average of quicksort is computed over all possible random choices that the algorithm might make when choosing the pivot.

Although the worst-case runtime is Θ(n2), the average-case runtime is Θ(nlogn). It turns out that the worst-case does not happen often. For large values of n, the runtime is Θ(nlogn) with a high probability.

Note that the probability that the pivot is the middle value element each time is one out of n numbers, which is very rare. However, it is still the same runtime when the split is 10%-90% instead of a 50%–50% because the depth of the recursion tree will still be O(logn) with O(n) times taken each level of recursion.

Randomized greedy algorithm for the Eight queens problem

The eight queens problem is usually solved with a backtracking algorithm. However, a Las Vegas algorithm can be applied; in fact, it is more efficient than backtracking.

Place 8 queens on a chessboard so that no one attacks another. Remember that a queen attacks other pieces on the same row, column and diagonals.

Assume that k rows, 0 ≤ k ≤ 8, are successfully occupied by queens.

If k = 8, then stop with success. Otherwise, proceed to occupy row k + 1.

Calculate all positions on this row not attacked by existing queens. If there are none, then fail. Otherwise, pick one at random, increment k and repeat.

Note that the algorithm simply fails if a queen cannot be placed. But the process can be repeated and every time will generate different arrangement. [7]

Complexity class

The complexity class of decision problems that have Las Vegas algorithms with expected polynomial runtime is ZPP.

It turns out that

which is intimately connected with the way Las Vegas algorithms are sometimes constructed. Namely the class RP consists of all decision problems for which a randomized polynomial-time algorithm exists that always answers correctly when the correct answer is "no", but is allowed to be wrong with a certain probability bounded away from one when the answer is "yes". When such an algorithm exists for both a problem and its complement (with the answers "yes" and "no" swapped), the two algorithms can be run simultaneously and repeatedly: run each for a constant number of steps, taking turns, until one of them returns a definitive answer. This is the standard way to construct a Las Vegas algorithm that runs in expected polynomial time. Note that in general there is no worst case upper bound on the run time of a Las Vegas algorithm.

Optimal Las Vegas algorithm

In order to make a Las Vegas algorithm optimal, the expected run time should be minimized. This can be done by:

  1. The Las Vegas algorithm A(x) runs repeatedly for some number t1 steps. If A(x) stops during the run time then A(x) is done; otherwise, repeat the process from the beginning for another t2 steps, and so on.
  2. Designing a strategy that is optimal among all strategies for A(x), given the full information about the distribution of TA(x).

The existence of the optimal strategy might be a fascinating theoretical observation. However, it is not practical in real life because it is not easy to find the information of distribution of TA(x). Furthermore, there is no point of running the experiment repeatedly to obtain the information about the distribution since most of the time, the answer is needed only once for any x. [8]

Relation to Monte Carlo algorithms

Las Vegas algorithms can be contrasted with Monte Carlo algorithms, in which the resources used are bounded but the answer may be incorrect with a certain (typically small) probability. A Las Vegas algorithm can be converted into a Monte Carlo algorithm by running it for set time and generating a random answer when it fails to terminate. By an application of Markov's inequality, we can set the bound on the probability that the Las Vegas algorithm would go over the fixed limit.

Here is a table comparing Las Vegas and Monte Carlo algorithms: [9]

Running TimeCorrectness
Las Vegas Algorithmprobabilisticcertain
Monte Carlo Algorithmcertainprobabilistic

If a deterministic way to test for correctness is available, then it is possible to turn a Monte Carlo algorithm into a Las Vegas algorithm. However, it is hard to convert a Monte Carlo algorithm to a Las Vegas algorithm without a way to test the algorithm. On the other hand, changing a Las Vegas algorithm to a Monte Carlo algorithm is easy. This can be done by running a Las Vegas algorithm for a specific period of time given by confidence parameter. If the algorithm finds the solution within the time, then it is success and if not then output can simply be "sorry".

This is an example of Las Vegas and Monte Carlo algorithms for comparison: [10]

Assume that there is an array with the length of even n. Half of the entries in the array are 0's and the remaining half are 1's. The goal here is to find an index that contains a 1.

// Las Vegas algorithmrepeat:k=RandInt(n)ifA[k]==1,returnk;// Monte Carlo algorithmrepeat300times:k=RandInt(n)ifA[k]==1returnkreturn"Failed"

Since Las Vegas does not end until it finds 1 in the array, it does not gamble with the correctness but run-time. On the other hand, Monte Carlo runs 300 times, which means it is impossible to know that Monte Carlo will find "1" in the array within 300 times of loops until it actually executes the code. It might find the solution or not. Therefore, unlike Las Vegas, Monte Carlo does not gamble with run-time but correctness.

See also

Related Research Articles

In computational complexity theory, a branch of computer science, bounded-error probabilistic polynomial time (BPP) is the class of decision problems solvable by a probabilistic Turing machine in polynomial time with an error probability bounded by 1/3 for all instances. BPP is one of the largest practical classes of problems, meaning most problems of interest in BPP have efficient probabilistic algorithms that can be run quickly on real modern machines. BPP also contains P, the class of problems solvable in polynomial time with a deterministic machine, since a deterministic machine is a special case of a probabilistic machine.

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

<span class="mw-page-title-main">Metropolis–Hastings algorithm</span> Monte Carlo algorithm

In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. New samples are added to the sequence in two steps: first a new sample is proposed based on the previous sample, then the proposed sample is either added to the sequence or rejected depending on the value of the probability distribution at that point. The resulting sequence can be used to approximate the distribution or to compute an integral.

<span class="mw-page-title-main">Bucket sort</span> Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output are random variables.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

In computing, a Monte Carlo algorithm is a randomized algorithm whose output may be incorrect with a certain probability. Two examples of such algorithms are the Karger–Stein algorithm and the Monte Carlo algorithm for minimum feedback arc set.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

sort is a generic function in the C++ Standard Library for doing comparison sorting. The function originated in the Standard Template Library (STL).

Monte Carlo localization (MCL), also known as particle filter localization, is an algorithm for robots to localize using a particle filter. Given a map of the environment, the algorithm estimates the position and orientation of a robot as it moves and senses the environment. The algorithm uses a particle filter to represent the distribution of likely states, with each particle representing a possible state, i.e., a hypothesis of where the robot is. The algorithm typically starts with a uniform random distribution of particles over the configuration space, meaning the robot has no information about where it is and assumes it is equally likely to be at any point in space. Whenever the robot moves, it shifts the particles to predict its new state after the movement. Whenever the robot senses something, the particles are resampled based on recursive Bayesian estimation, i.e., how well the actual sensed data correlate with the predicted state. Ultimately, the particles should converge towards the actual position of the robot.

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, and HTML documentation .

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

In computer science, partial sorting is a relaxed variant of the sorting problem. Total sorting is the problem of returning a list of items such that its elements all appear in order, while partial sorting is returning a list of the k smallest elements in order. The other elements may also be sorted, as in an in-place partial sort, or may be discarded, which is common in streaming partial sorts. A common practical example of partial sorting is computing the "Top 100" of some list.

In computational statistics, the pseudo-marginal Metropolis–Hastings algorithm is a Monte Carlo method to sample from a probability distribution. It is an instance of the popular Metropolis–Hastings algorithm that extends its use to cases where the target density is not available analytically. It relies on the fact that the Metropolis–Hastings algorithm can still sample from the correct target distribution if the target density in the acceptance ratio is replaced by an estimate. It is especially popular in Bayesian statistics, where it is applied if the likelihood function is not tractable.

References

Citations

  1. Steven D. Galbraith (2012). Mathematics of Public Key Cryptography. Cambridge University Press. p. 16. ISBN   978-1-107-01392-6.
  2. Hoos, Holger H.. “On the Empirical Evaluation of Las Vegas Algorithms — Position Paper.” (1998).
  3. Babai, László. “Monte-Carlo algorithms in graph isomorphism testing.” (1979).
  4. H.H. Hoos and T. Stützle. Evaluating Las Vegas Algorithms — Pitfalls and Remedies. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 238–245. Morgan Kaufmann Publishers, San Francisco, CA, 1998.
  5. Randomized Algorithms. Brilliant.org. Retrieved 23:54, October 24, 2018, from https://brilliant.org/wiki/randomized-algorithms-overview/
  6. Barringer, Howard (December 2010). "Randomized Algorithms - a Brief Introduction" (PDF). www.cs.man.ac.uk. Retrieved 2018-12-08.
  7. Luby, Michael (27 September 1993). "Optimal Speedup of Las Vegas algorithms". Information Processing Letters. 47 (4): 173–180. doi:10.1016/0020-0190(93)90029-9.
  8. Goodrich, Michael. Algorithm Design and Applications: Randomized Algorithms. Wiley, 2015, https://nscpolteksby.ac.id/ebook/files/Ebook/Computer%20Engineering/Algorithm%20Design%20and%20Applications%20A4%20(2015)/20.%20Chapter%2019%20-%20Randomized%20Algorithms.pdf. Oct 23, 2018.
  9. Procaccia, Ariel (5 November 2015). "Great Theoretical Ideas in Computer Science" (PDF). www.cs.cmu.edu (PowerPoint). Retrieved 3 November 2018.

Sources

  • Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999.
  • "Las Vegas algorithm", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 17 July 2006. (accessed May 9, 2009) Available from: