Smoothed analysis

Last updated
A randomly generated bitmap does not resemble typical pictures. Every pixel has a random color.png
A randomly generated bitmap does not resemble typical pictures.
A typical picture does not resemble a random bitmap. Edible fungi in basket 2012 G1.jpg
A typical picture does not resemble a random bitmap.

In theoretical computer science, smoothed analysis is a way of measuring the complexity of an algorithm. Since its introduction in 2001, smoothed analysis has been used as a basis for considerable research, for problems ranging from mathematical programming, numerical analysis, machine learning, and data mining. [1] It can give a more realistic analysis of the practical performance (e.g., running time, success rate, approximation quality) of the algorithm compared to analysis that uses worst-case or average-case scenarios.

Contents

Smoothed analysis is a hybrid of worst-case and average-case analyses that inherits advantages of both. It measures the expected performance of algorithms under slight random perturbations of worst-case inputs. If the smoothed complexity of an algorithm is low, then it is unlikely that the algorithm will take a long time to solve practical instances whose data are subject to slight noises and imprecisions. Smoothed complexity results are strong probabilistic results, roughly stating that, in every large enough neighbourhood of the space of inputs, most inputs are easily solvable. Thus, a low smoothed complexity means that the hardness of inputs is a "brittle" property.

Although worst-case complexity has been widely successful in explaining the practical performance of many algorithms, this style of analysis gives misleading results for a number of problems. Worst-case complexity measures the time it takes to solve any input, although hard-to-solve inputs might never come up in practice. In such cases, the worst-case running time can be much worse than the observed running time in practice. For example, the worst-case complexity of solving a linear program using the simplex algorithm is exponential, [2] although the observed number of steps in practice is roughly linear. [3] [4] The simplex algorithm is in fact much faster than the ellipsoid method in practice, although the latter has polynomial-time worst-case complexity.

Average-case analysis was first introduced to overcome the limitations of worst-case analysis. However, the resulting average-case complexity depends heavily on the probability distribution that is chosen over the input. The actual inputs and distribution of inputs may be different in practice from the assumptions made during the analysis: a random input may be very unlike a typical input. Because of this choice of data model, a theoretical average-case result might say little about practical performance of the algorithm.

Smoothed analysis generalizes both worst-case and average-case analysis and inherits strengths of both. It is intended to be much more general than average-case complexity, while still allowing low complexity bounds to be proven.

History

ACM and the European Association for Theoretical Computer Science awarded the 2008 Gödel Prize to Daniel Spielman and Shanghua Teng for developing smoothed analysis. The name Smoothed Analysis was coined by Alan Edelman. [1] In 2010 Spielman received the Nevanlinna Prize for developing smoothed analysis. Spielman and Teng's JACM paper "Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time" was also one of the three winners of the 2009 Fulkerson Prize sponsored jointly by the Mathematical Programming Society (MPS) and the American Mathematical Society (AMS).

Examples

Simplex algorithm for linear programming

The simplex algorithm is a very efficient algorithm in practice, and it is one of the dominant algorithms for linear programming in practice. On practical problems, the number of steps taken by the algorithm is linear in the number of variables and constraints. [3] [4] Yet in the theoretical worst case it takes exponentially many steps for most successfully analyzed pivot rules. This was one of the main motivations for developing smoothed analysis. [5]

For the perturbation model, we assume that the input data is perturbed by noise from a Gaussian distribution. For normalization purposes, we assume the unperturbed data satisfies for all rows of the matrix The noise has independent entries sampled from a Gaussian distribution with mean and standard deviation . We set . The smoothed input data consists of the linear program

maximize
subject to
.

If the running time of our algorithm on data is given by then the smoothed complexity of the simplex method is [6]

This bound holds for a specific pivot rule called the shadow vertex rule. The shadow vertex rule is slower than more commonly used pivot rules such as Dantzig's rule or the steepest edge rule [7] but it has properties that make it very well-suited to probabilistic analysis. [8]

Local search for combinatorial optimization

A number of local search algorithms have bad worst-case running times but perform well in practice. [9]

One example is the 2-opt heuristic for the traveling salesman problem. It can take exponentially many iterations until it finds a locally optimal solution, although in practice the running time is subquadratic in the number of vertices. [10] The approximation ratio, which is the ratio between the length of the output of the algorithm and the length of the optimal solution, tends to be good in practice but can also be bad in the theoretical worst case.

One class of problem instances can be given by points in the box , where their pairwise distances come from a norm. Already in two dimensions, the 2-opt heuristic might take exponentially many iterations until finding a local optimum. In this setting, one can analyze the perturbation model where the vertices are independently sampled according to probability distributions with probability density function . For , the points are uniformly distributed. When is big, the adversary has more ability to increase the likelihood of hard problem instances. In this perturbation model, the expected number of iterations of the 2-opt heuristic, as well as the approximation ratios of resulting output, are bounded by polynomial functions of and . [10]

Another local search algorithm for which smoothed analysis was successful is the k-means method. Given points in , it is NP-hard to find a good partition into clusters with small pairwise distances between points in the same cluster. Lloyd's algorithm is widely used and very fast in practice, although it can take iterations in the worst case to find a locally optimal solution. However, assuming that the points have independent Gaussian distributions, each with expectation in and standard deviation , the expected number of iterations of the algorithm is bounded by a polynomial in , and . [11]

See also

Related Research Articles

The P versus NP problem is a major unsolved problem in theoretical computer science. Informally, it asks whether every problem whose solution can be quickly verified can also be quickly solved.

In computer science, best, worst, and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively. Usually the resource being considered is running time, i.e. time complexity, but could also be memory or some other resource. Best case is the function which performs the minimum number of steps on input data of n elements. Worst case is the function which performs the maximum number of steps on input data of size n. Average case is the function which performs an average number of steps on input data of n elements.

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

<span class="mw-page-title-main">Linear programming</span> Method to solve optimization problems

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements and objective are represented by linear relationships. Linear programming is a special case of mathematical programming.

In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first introduced by Andrew Yao in 1979, while studying the problem of computation distributed among several machines. The problem is usually stated as follows: two parties each receive a -bit string and . The goal is for Alice to compute the value of a certain function, , that depends on both and , with the least amount of communication between them.

In mathematical optimization, Dantzig's simplex algorithm is a popular algorithm for linear programming.

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

The Fulkerson Prize for outstanding papers in the area of discrete mathematics is sponsored jointly by the Mathematical Optimization Society (MOS) and the American Mathematical Society (AMS). Up to three awards of $1,500 each are presented at each (triennial) International Symposium of the MOS. Originally, the prizes were paid out of a memorial fund administered by the AMS that was established by friends of the late Delbert Ray Fulkerson to encourage mathematical excellence in the fields of research exemplified by his work. The prizes are now funded by an endowment administered by MPS.

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances, but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case complexity which considers the maximal complexity of the algorithm over all possible inputs.

Artificial neural networks are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued vectors to real-valued vectors. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces.

In computer science, lattice problems are a class of optimization problems related to mathematical objects called lattices. The conjectured intractability of such problems is central to the construction of secure lattice-based cryptosystems: lattice problems are an example of NP-hard problems which have been shown to be average-case hard, providing a test case for the security of cryptographic algorithms. In addition, some lattice problems which are worst-case hard can be used as a basis for extremely secure cryptographic schemes. The use of worst-case hardness in such schemes makes them among the very few schemes that are very likely secure even against quantum computers. For applications in such cryptosystems, lattices over vector space or free modules are generally considered.

In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

Generic-case complexity is a subfield of computational complexity theory that studies the complexity of computational problems on "most inputs".

In discrete mathematics, ideal lattices are a special class of lattices and a generalization of cyclic lattices. Ideal lattices naturally occur in many parts of number theory, but also in other areas. In particular, they have a significant place in cryptography. Micciancio defined a generalization of cyclic lattices as ideal lattices. They can be used in cryptosystems to decrease by a square root the number of parameters necessary to describe a lattice, making them more efficient. Ideal lattices are a new concept, but similar lattice classes have been used for a long time. For example, cyclic lattices, a special case of ideal lattices, are used in NTRUEncrypt and NTRUSign.

<span class="mw-page-title-main">Criss-cross algorithm</span> Method for mathematical optimization

In mathematical optimization, the criss-cross algorithm is any of a family of algorithms for linear programming. Variants of the criss-cross algorithm also solve more general problems with linear inequality constraints and nonlinear objective functions; there are criss-cross algorithms for linear-fractional programming problems, quadratic-programming problems, and linear complementarity problems.

<span class="mw-page-title-main">Klee–Minty cube</span> Unit hypercube of variable dimension whose corners have been perturbed

The Klee–Minty cube or Klee–Minty polytope is a unit hypercube of variable dimension whose corners have been perturbed. Klee and Minty demonstrated that George Dantzig's simplex algorithm has poor worst-case performance when initialized at one corner of their "squashed cube". On the three-dimensional version, the simplex algorithm and the criss-cross algorithm visit all 8 corners in the worst case.

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

References

  1. 1 2 Spielman, Daniel; Teng, Shang-Hua (2009), "Smoothed analysis: an attempt to explain the behavior of algorithms in practice" (PDF), Communications of the ACM, ACM, 52 (10): 76–84, doi:10.1145/1562764.1562785, S2CID   7904807
  2. Amenta, Nina; Ziegler, Günter (1999), "Deformed products and maximal shadows of polytopes", Contemporary Mathematics, American Mathematical Society, 223: 10–19, CiteSeerX   10.1.1.80.3241 , doi:10.1090/conm/223, ISBN   9780821806746, MR   1661377
  3. 1 2 Shamir, Ron (1987), "The Efficiency of the Simplex Method: A Survey", Management Science, 33 (3): 301–334, doi:10.1287/mnsc.33.3.301
  4. 1 2 Andrei, Neculai (2004), "Andrei, Neculai. "On the complexity of MINOS package for linear programming", Studies in Informatics and Control, 13 (1): 35–46
  5. Spielman, Daniel; Teng, Shang-Hua (2001), "Smoothed analysis of algorithms", Proceedings of the thirty-third annual ACM symposium on Theory of computing, ACM, pp. 296–305, arXiv: cs/0111050 , Bibcode:2001cs.......11050S, doi:10.1145/380752.380813, ISBN   978-1-58113-349-3, S2CID   1471 {{citation}}: CS1 maint: date and year (link)
  6. Dadush, Daniel; Huiberts, Sophie (2018), "A friendly smoothed analysis of the simplex method", Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 390–403, arXiv: 1711.05667 , doi:10.1145/3188745.3188826, ISBN   9781450355599, S2CID   11868079 {{citation}}: CS1 maint: date and year (link)
  7. Borgwardt, Karl-Heinz; Damm, Renate; Donig, Rudolf; Joas, Gabriele (1993), "Empirical studies on the average efficiency of simplex variants under rotation symmetry", ORSA Journal on Computing, Operations Research Society of America, 5 (3): 249–260, doi:10.1287/ijoc.5.3.249
  8. Borgwardt, Karl-Heinz (1987), The Simplex Method: A Probabilistic Analysis, Algorithms and Combinatorics, vol. 1, Springer-Verlag, doi:10.1007/978-3-642-61578-8, ISBN   978-3-540-17096-9
  9. Manthey, Bodo (2021), Roughgarden, Tim (ed.), "Smoothed Analysis of Local Search", Beyond the Worst-Case Analysis of Algorithms, Cambridge: Cambridge University Press, pp. 285–308, doi:10.1017/9781108637435.018, ISBN   978-1-108-49431-1, S2CID   221680879 , retrieved 2022-06-15
  10. 1 2 Englert, Matthias; Röglin, Heiko; Vöcking, Berthold (2007), "Worst Case and Probabilistic Analysis of the 2-Opt Algorithm for the TSP", Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 68: 190–264, arXiv: 2302.06889 , doi: 10.1007/s00453-013-9801-4
  11. Arthur, David; Manthey, Bodo; Röglin, Heiko (2011), "Smoothed Analysis of the k-Means Method" (PDF), Journal of the ACM, 58 (5): 1–31, doi:10.1145/2027216.2027217, S2CID   5253105