Randomized rounding

Last updated

In computer science and operations research, randomized rounding [1] is a widely-used approach for designing and analyzing approximation algorithms. [2] [3]

Contents

Many combinatorial optimization problems are computationally intractable to solve exactly (to optimality). For such problems, randomized rounding can be used to design fast (polynomial time) approximation algorithms—that is, algorithms that are guaranteed to return an approximately optimal solution given any input.

The basic idea of randomized rounding is to convert an optimal solution of a relaxation of the problem into an approximately-optimal solution to the original problem. The resulting algorithm is usually analyzed using the probabilistic method.

Overview

The basic approach has three steps:

  1. Formulate the problem to be solved as an integer linear program (ILP).
  2. Compute an optimal fractional solution to the linear programming relaxation (LP) of the ILP.
  3. Round the fractional solution of the LP to an integer solution of the ILP.

(Although the approach is most commonly applied with linear programs, other kinds of relaxations are sometimes used. For example, see Goemans' and Williamson's semidefinite programming-based Max-Cut approximation algorithm.)

In the first step, the challenge is to choose a suitable integer linear program. Familiarity with linear programming, in particular modelling using linear programs and integer linear programs, is required. For many problems, there is a natural integer linear program that works well, such as in the Set Cover example below. (The integer linear program should have a small integrality gap; indeed randomized rounding is often used to prove bounds on integrality gaps.)

In the second step, the optimal fractional solution can typically be computed in polynomial time using any standard linear programming algorithm.

In the third step, the fractional solution must be converted into an integer solution (and thus a solution to the original problem). This is called rounding the fractional solution. The resulting integer solution should (provably) have cost not much larger than the cost of the fractional solution. This will ensure that the cost of the integer solution is not much larger than the cost of the optimal integer solution.

The main technique used to do the third step (rounding) is to use randomization, and then to use probabilistic arguments to bound the increase in cost due to the rounding (following the probabilistic method from combinatorics). Therein, probabilistic arguments are used to show the existence of discrete structures with desired properties. In this context, one uses such arguments to show the following:

Given any fractional solution of the LP, with positive probability the randomized rounding process produces an integer solution that approximates according to some desired criterion.

Finally, to make the third step computationally efficient, one either shows that approximates with high probability (so that the step can remain randomized) or one derandomizes the rounding step, typically using the method of conditional probabilities. The latter method converts the randomized rounding process into an efficient deterministic process that is guaranteed to reach a good outcome.

Example: the set cover problem

The following example illustrates how randomized rounding can be used to design an approximation algorithm for the set cover problem. Fix any instance of set cover over a universe .

Computing the fractional solution

For step 1, let IP be the standard integer linear program for set cover for this instance.

For step 2, let LP be the linear programming relaxation of IP, and compute an optimal solution to LP using any standard linear programming algorithm. This takes time polynomial in the input size. The feasible solutions to LP are the vectors that assign each set a non-negative weight , such that, for each element , covers -- the total weight assigned to the sets containing is at least 1, that is,

The optimal solution is a feasible solution whose cost

is as small as possible. Note that any set cover for gives a feasible solution (where for , otherwise). The cost of this equals the cost of , that is,

In other words, the linear program LP is a relaxation of the given set-cover problem.

Since has minimum cost among feasible solutions to the LP, the cost of is a lower bound on the cost of the optimal set cover.

Randomized rounding step

In step 3, we must convert the minimum-cost fractional set cover into a feasible integer solution (corresponding to a true set cover). The rounding step should produce an that, with positive probability, has cost within a small factor of the cost of .Then (since the cost of is a lower bound on the cost of the optimal set cover), the cost of will be within a small factor of the optimal cost.

As a starting point, consider the most natural rounding scheme:

For each set in turn, take with probability , otherwise take .

With this rounding scheme, the expected cost of the chosen sets is at most , the cost of the fractional cover. This is good. Unfortunately the coverage is not good. When the variables are small, the probability that an element is not covered is about

So only a constant fraction of the elements will be covered in expectation.

To make cover every element with high probability, the standard rounding scheme first scales up the rounding probabilities by an appropriate factor . Here is the standard rounding scheme:

Fix a parameter . For each set in turn,
take with probability , otherwise take .

Scaling the probabilities up by increases the expected cost by , but makes coverage of all elements likely. The idea is to choose as small as possible so that all elements are provably covered with non-zero probability. Here is a detailed analysis.


Lemma (approximation guarantee for rounding scheme)

Fix . With positive probability, the rounding scheme returns a set cover of cost at most (and thus of cost times the cost of the optimal set cover).

(Note: with care the can be reduced to .)

Proof

The output of the random rounding scheme has the desired properties as long as none of the following "bad" events occur:

  1. the cost of exceeds , or
  2. for some element , fails to cover .

The expectation of each is at most . By linearity of expectation, the expectation of is at most . Thus, by Markov's inequality, the probability of the first bad event above is at most .

For the remaining bad events (one for each element ), note that, since for any given element , the probability that is not covered is

(This uses the inequality , which is strict for .)

Thus, for each of the elements, the probability that the element is not covered is less than .

By the union bound, the probability that one of the bad events happens is less than . Thus, with positive probability there are no bad events and is a set cover of cost at most . QED

Derandomization using the method of conditional probabilities

The lemma above shows the existence of a set cover of cost ). In this context our goal is an efficient approximation algorithm, not just an existence proof, so we are not done.

One approach would be to increase a little bit, then show that the probability of success is at least, say, 1/4. With this modification, repeating the random rounding step a few times is enough to ensure a successful outcome with high probability.

That approach weakens the approximation ratio. We next describe a different approach that yields a deterministic algorithm that is guaranteed to match the approximation ratio of the existence proof above. The approach is called the method of conditional probabilities.

The deterministic algorithm emulates the randomized rounding scheme: it considers each set in turn, and chooses . But instead of making each choice randomly based on , it makes the choice deterministically, so as to keep the conditional probability of failure, given the choices so far, below 1.

Bounding the conditional probability of failure

We want to be able to set each variable in turn so as to keep the conditional probability of failure below 1. To do this, we need a good bound on the conditional probability of failure. The bound will come by refining the original existence proof. That proof implicitly bounds the probability of failure by the expectation of the random variable

,

where

is the set of elements left uncovered at the end.

The random variable may appear a bit mysterious, but it mirrors the probabilistic proof in a systematic way. The first term in comes from applying Markov's inequality to bound the probability of the first bad event (the cost is too high). It contributes at least 1 to if the cost of is too high. The second term counts the number of bad events of the second kind (uncovered elements). It contributes at least 1 to if leaves any element uncovered. Thus, in any outcome where is less than 1, must cover all the elements and have cost meeting the desired bound from the lemma. In short, if the rounding step fails, then . This implies (by Markov's inequality) that is an upper bound on the probability of failure. Note that the argument above is implicit already in the proof of the lemma, which also shows by calculation that .

To apply the method of conditional probabilities, we need to extend the argument to bound the conditional probability of failure as the rounding step proceeds. Usually, this can be done in a systematic way, although it can be technically tedious.

So, what about the conditional probability of failure as the rounding step iterates through the sets? Since in any outcome where the rounding step fails, by Markov's inequality, the conditional probability of failure is at most the conditional expectation of .

Next we calculate the conditional expectation of , much as we calculated the unconditioned expectation of in the original proof. Consider the state of the rounding process at the end of some iteration . Let denote the sets considered so far (the first sets in ). Let denote the (partially assigned) vector (so is determined only if ). For each set , let denote the probability with which will be set to 1. Let contain the not-yet-covered elements. Then the conditional expectation of , given the choices made so far, that is, given , is

Note that is determined only after iteration .

Keeping the conditional probability of failure below 1

To keep the conditional probability of failure below 1, it suffices to keep the conditional expectation of below 1. To do this, it suffices to keep the conditional expectation of from increasing. This is what the algorithm will do. It will set in each iteration to ensure that

(where ).

In the th iteration, how can the algorithm set to ensure that ? It turns out that it can simply set so as to minimize the resulting value of .

To see why, focus on the point in time when iteration starts. At that time, is determined, but is not yet determined --- it can take two possible values depending on how is set in iteration . Let denote the value of . Let and , denote the two possible values of , depending on whether is set to 0, or 1, respectively. By the definition of conditional expectation,

Since a weighted average of two quantities is always at least the minimum of those two quantities, it follows that

Thus, setting so as to minimize the resulting value of will guarantee that . This is what the algorithm will do.

In detail, what does this mean? Considered as a function of (with all other quantities fixed) is a linear function of , and the coefficient of in that function is

Thus, the algorithm should set to 0 if this expression is positive, and 1 otherwise. This gives the following algorithm.

Randomized-rounding algorithm for set cover

input: set system , universe , cost vector

output: set cover (a solution to the standard integer linear program for set cover)

  1. Compute a min-cost fractional set cover (an optimal solution to the LP relaxation).
  2. Let . Let for each .
  3. For each do:
    1. Let .   ( contains the not-yet-decided sets.)
    2. If   
      then set ,
      else set and .
        ( contains the not-yet-covered elements.)
  4. Return .

lemma (approximation guarantee for algorithm)

The algorithm above returns a set cover of cost at most times the minimum cost of any (fractional) set cover.

proof

The algorithm ensures that the conditional expectation of , , does not increase at each iteration. Since this conditional expectation is initially less than 1 (as shown previously), the algorithm ensures that the conditional expectation stays below 1. Since the conditional probability of failure is at most the conditional expectation of , in this way the algorithm ensures that the conditional probability of failure stays below 1. Thus, at the end, when all choices are determined, the algorithm reaches a successful outcome. That is, the algorithm above returns a set cover of cost at most times the minimum cost of any (fractional) set cover.

Remarks

In the example above, the algorithm was guided by the conditional expectation of a random variable . In some cases, instead of an exact conditional expectation, an upper bound (or sometimes a lower bound) on some conditional expectation is used instead. This is called a pessimistic estimator.

Comparison to other applications of the probabilistic method

The randomized rounding step differs from most applications of the probabilistic method in two respects:

  1. The computational complexity of the rounding step is important. It should be implementable by a fast (e.g. polynomial time) algorithm.
  2. The probability distribution underlying the random experiment is a function of the solution of a relaxation of the problem instance. This fact is crucial to proving the performance guarantee of the approximation algorithm --- that is, that for any problem instance, the algorithm returns a solution that approximates the optimal solution for that specific instance. In comparison, applications of the probabilistic method in combinatorics typically show the existence of structures whose features depend on other parameters of the input. For example, consider Turán's theorem, which can be stated as "any graph with vertices of average degree must have an independent set of size at least . (See this for a probabilistic proof of Turán's theorem.) While there are graphs for which this bound is tight, there are also graphs which have independent sets much larger than . Thus, the size of the independent set shown to exist by Turán's theorem in a graph may, in general, be much smaller than the maximum independent set for that graph.

See also

Related Research Articles

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In physics, the CHSH inequality can be used in the proof of Bell's theorem, which states that certain consequences of entanglement in quantum mechanics cannot be reproduced by local hidden-variable theories. Experimental verification of the inequality being violated is seen as confirmation that nature cannot be described by such theories. CHSH stands for John Clauser, Michael Horne, Abner Shimony, and Richard Holt, who described it in a much-cited paper published in 1969. They derived the CHSH inequality, which, as with John Stewart Bell's original inequality, is a constraint on the statistical occurrence of "coincidences" in a Bell test which is necessarily true if there exist underlying local hidden variables, an assumption that is sometimes termed local realism. In practice, the inequality is routinely violated by modern experiments in quantum mechanics.

In probability theory and related fields, Malliavin calculus is a set of mathematical techniques and ideas that extend the mathematical field of calculus of variations from deterministic functions to stochastic processes. In particular, it allows the computation of derivatives of random variables. Malliavin calculus is also called the stochastic calculus of variations. P. Malliavin first initiated the calculus on infinite dimensional space. Then, the significant contributors such as S. Kusuoka, D. Stroock, J-M. Bismut, S. Watanabe, I. Shigekawa, and so on finally completed the foundations.

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In statistics and probability theory, a point process or point field is a collection of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Point processes can be used for spatial data analysis, which is of interest in such diverse disciplines as forestry, plant ecology, epidemiology, geography, seismology, materials science, astronomy, telecommunications, computational neuroscience, economics and others.

In mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite systems. The canonical ensemble gives the probability of the system X being in state x as

<span class="mw-page-title-main">Linear programming relaxation</span>

In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, over the generation sequence, individuals with better and better -values are generated.

In mathematics, in particular in measure theory, a content is a real-valued function defined on a collection of subsets such that

In multilinear algebra, the tensor rank decomposition or the decomposition of a tensor is the decomposition of a tensor in terms of a sum of minimum tensors. This is an open problem.

<span class="mw-page-title-main">Hilbert space</span> Type of topological vector space

In mathematics, Hilbert spaces allow the methods of linear algebra and calculus to be generalized from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise naturally and frequently in mathematics and physics, typically as function spaces. Formally, a Hilbert space is a vector space equipped with an inner product that induces a distance function for which the space is a complete metric space.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

In mathematics, lifting theory was first introduced by John von Neumann in a pioneering paper from 1931, in which he answered a question raised by Alfréd Haar. The theory was further developed by Dorothy Maharam (1958) and by Alexandra Ionescu Tulcea and Cassius Ionescu Tulcea (1961). Lifting theory was motivated to a large extent by its striking applications. Its development up to 1969 was described in a monograph of the Ionescu Tulceas. Lifting theory continued to develop since then, yielding new results and applications.

In probability theory, the zero-truncated Poisson (ZTP) distribution is a certain discrete probability distribution whose support is the set of positive integers. This distribution is also known as the conditional Poisson distribution or the positive Poisson distribution. It is the conditional probability distribution of a Poisson-distributed random variable, given that the value of the random variable is not zero. Thus it is impossible for a ZTP random variable to be zero. Consider for example the random variable of the number of items in a shopper's basket at a supermarket checkout line. Presumably a shopper does not stand in line with nothing to buy, so this phenomenon may follow a ZTP distribution.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

The quantum algorithm for linear systems of equations, also called HHL algorithm, designed by Aram Harrow, Avinatan Hassidim, and Seth Lloyd, is a quantum algorithm published in 2008 for solving linear systems. The algorithm estimates the result of a scalar measurement on the solution vector to a given linear system of equations.

In mathematics, the hafnian of an adjacency matrix of a graph is the number of perfect matchings in the graph. It was so named by Eduardo R. Caianiello "to mark the fruitful period of stay in Copenhagen ."

Tau functions are an important ingredient in the modern mathematical theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form.

References

  1. Raghavan, Prabhakar; Tompson, Clark D. (1987), "Randomized rounding: A technique for provably good algorithms and algorithmic proofs", Combinatorica , 7 (4): 365–374, doi:10.1007/BF02579324, S2CID   5749936 .
  2. Motwani, Rajeev; Raghavan, Prabhakar (1995-08-25). Randomized algorithms. Cambridge University Press. ISBN   978-0-521-47465-8.
  3. Vazirani, Vijay (2002-12-05). Approximation algorithms. Springer Verlag. ISBN   978-3-540-65367-7.
  4. Young, Neal E. (2002). "Randomized Rounding without Solving the Linear Program". arXiv: cs/0205036 .
  5. Young, Neal. "Oblivious randomized rounding". AlgNotes. Retrieved 2023-09-14.

Further reading