Fixed-point computation

Last updated

Fixed-point computation refers to the process of computing an exact or approximate fixed point of a given function. [1] In its most common form, the given function satisfies the condition to the Brouwer fixed-point theorem: that is, is continuous and maps the unit d-cube to itself. The Brouwer fixed-point theorem guarantees that has a fixed point, but the proof is not constructive. Various algorithms have been devised for computing an approximate fixed point. Such algorithms are used in economics for computing a market equilibrium, in game theory for computing a Nash equilibrium, and in dynamic system analysis.

Contents

Definitions

The graph of an example function with three fixed points Fixed point example.svg
The graph of an example function with three fixed points

The unit interval is denoted by , and the unit d-dimensional cube is denoted by . A continuous function is defined on (from to itself). Often, it is assumed that is not only continuous but also Lipschitz continuous, that is, for some constant , for all in .

A fixed point of is a point in such that . By the Brouwer fixed-point theorem, any continuous function from to itself has a fixed point. But for general functions, it is impossible to compute a fixed point precisely, since it can be an arbitrary real number. Fixed-point computation algorithms look for approximate fixed points. There are several criteria for an approximate fixed point. Several common criteria are: [2]

For Lipschitz-continuous functions, the absolute criterion is stronger than the residual criterion: If is Lipschitz-continuous with constant , then implies . Since is a fixed-point of , this implies , so . Therefore, a δ-absolute fixed-point is also an ε-residual fixed-point with .

The most basic step of a fixed-point computation algorithm is a value query: given any in , the algorithm is provided with an oracle to that returns the value . The accuracy of the approximate fixed-point depends upon the error in the oracle .

The function is accessible via evaluation queries: for any , the algorithm can evaluate . The run-time complexity of an algorithm is usually given by the number of required evaluations.

Contractive functions

A Lipschitz-continuous function with constant is called contractive if ; it is called weakly-contractive if . Every contractive function satisfying Brouwer's conditions has a unique fixed point. Moreover, fixed-point computation for contractive functions is easier than for general functions.

Computing a fixed point using function iteration Fixed point anime.gif
Computing a fixed point using function iteration

The first algorithm for fixed-point computation was the fixed-point iteration algorithm of Banach. Banach's fixed-point theorem implies that, when fixed-point iteration is applied to a contraction mapping, the error after iterations is in . Therefore, the number of evaluations required for a -relative fixed-point is approximately . Sikorski and Wozniakowski [4] showed that Banach's algorithm is optimal when the dimension is large. Specifically, when , the number of required evaluations of any algorithm for -relative fixed-point is larger than 50% the number of evaluations required by the iteration algorithm. Note that when approaches 1, the number of evaluations approaches infinity. No finite algorithm can compute a -absolute fixed point for all functions with . [5]

When < 1 and d = 1, the optimal algorithm is the Fixed Point Envelope (FPE) algorithm of Sikorski and Wozniakowski. [4] It finds a δ-relative fixed point using queries, and a δ-absolute fixed point using queries. This is faster than the fixed-point iteration algorithm. [6]

When but not too large, and , the optimal algorithm is the interior-ellipsoid algorithm (based on the ellipsoid method). [7] It finds an ε-residual fixed-point using evaluations. When , it finds a -absolute fixed point using evaluations.

Shellman and Sikorski [8] presented an algorithm called BEFix (Bisection Envelope Fixed-point) for computing an ε-residual fixed-point of a two-dimensional function with ', using only queries. They later [9] presented an improvement called BEDFix (Bisection Envelope Deep-cut Fixed-point), with the same worst-case guarantee but better empirical performance. When , BEDFix can also compute a -absolute fixed-point using queries.

Shellman and Sikorski [2] presented an algorithm called PFix for computing an ε-residual fixed-point of a d-dimensional function with L ≤ 1, using queries. When < 1, PFix can be executed with , and in that case, it computes a δ-absolute fixed-point, using queries. It is more efficient than the iteration algorithm when is close to 1. The algorithm is recursive: it handles a d-dimensional function by recursive calls on (d-1)-dimensional functions.

Algorithms for differentiable functions

When the function is differentiable, and the algorithm can evaluate its derivative (not only itself), the Newton method can be used and it is much faster. [10] [11]

General functions

For functions with Lipschitz constant > 1, computing a fixed-point is much harder.

One dimension

For a 1-dimensional function (d = 1), a -absolute fixed-point can be found using queries using the bisection method: start with the interval ; at each iteration, let be the center of the current interval, and compute ; if then recurse on the sub-interval to the right of ; otherwise, recurse on the interval to the left of . Note that the current interval always contains a fixed point, so after queries, any point in the remaining interval is a -absolute fixed-point of Setting , where is the Lipschitz constant, gives an ε-residual fixed-point, using queries. [3]

Two or more dimensions

For functions in two or more dimensions, the problem is much more challenging. Shellman and Sikorski [2] proved that for any integers d ≥ 2 and > 1, finding a δ-absolute fixed-point of d-dimensional -Lipschitz functions might require infinitely many evaluations. The proof idea is as follows. For any integer T > 1 and any sequence of T of evaluation queries (possibly adaptive), one can construct two functions that are Lipschitz-continuous with constant , and yield the same answer to all these queries, but one of them has a unique fixed-point at (x, 0) and the other has a unique fixed-point at (x, 1). Any algorithm using T evaluations cannot differentiate between these functions, so cannot find a δ-absolute fixed-point. This is true for any finite integer T.

Several algorithms based on function evaluations have been developed for finding an ε-residual fixed-point

In the worst case, the number of function evaluations required by all these algorithms is exponential in the binary representation of the accuracy, that is, in .

Query complexity

Hirsch, Papadimitriou and Vavasis proved that [3] any algorithm based on function evaluations, that finds an ε-residual fixed-point of f, requires function evaluations, where is the Lipschitz constant of the function (note that ). More precisely:

  • For a 2-dimensional function (d=2), they prove a tight bound .
  • For any d ≥ 3, finding an ε-residual fixed-point of a d-dimensional function requires queries and queries.

The latter result leaves a gap in the exponent. Chen and Deng [18] closed the gap. They proved that, for any d ≥ 2 and and , the number of queries required for computing an ε-residual fixed-point is in .

Discrete fixed-point computation

A discrete function is a function defined on a subset of (the d-dimensional integer grid). There are several discrete fixed-point theorems, stating conditions under which a discrete function has a fixed point. For example, the Iimura-Murota-Tamura theorem states that (in particular) if is a function from a rectangle subset of to itself, and is hypercubic direction-preserving , then has a fixed point.

Let be a direction-preserving function from the integer cube to itself. Chen and Deng [18] prove that, for any d ≥ 2 and n > 48d, computing such a fixed point requires function evaluations.

Chen and Deng [19] define a different discrete-fixed-point problem, which they call 2D-BROUWER. It considers a discrete function on such that, for every x on the grid, (x) - x is either (0, 1) or (1, 0) or (-1, -1). The goal is to find a square in the grid, in which all three labels occur. The function must map the square to itself, so it must map the lines x = 0 and y = 0 to either (0, 1) or (1, 0); the line x = n to either (-1, -1) or (0, 1); and the line y = n to either (-1, -1) or (1,0). The problem can be reduced to 2D-SPERNER (computing a fully-labeled triangle in a triangulation satisfying the conditions to Sperner's lemma), and therefore it is PPAD-complete. This implies that computing an approximate fixed-point is PPAD-complete even for very simple functions.

Relation between fixed-point computation and root-finding algorithms

Given a function from to R, a root of is a point x in such that (x)=0. An ε-root of g is a point x in such that .

Fixed-point computation is a special case of root-finding: given a function on , define . X is a fixed-point of if and only if x is a root of , and x is an ε-residual fixed-point of if and only if x is an ε-root of . Therefore, any root-finding algorithm (an algorithm that computes an approximate root of a function) can be used to find an approximate fixed-point.

The opposite is not true: finding an approximate root of a general function may be harder than finding an approximate fixed point. In particular, Sikorski [20] proved that finding an ε-root requires function evaluations. This gives an exponential lower bound even for a one-dimensional function (in contrast, an ε-residual fixed-point of a one-dimensional function can be found using queries using the bisection method). Here is a proof sketch. [3] :35 Construct a function that is slightly larger than ε everywhere in except in some small cube around some point x0, where x0 is the unique root of . If is Lipschitz continuous with constant , then the cube around x0 can have a side-length of . Any algorithm that finds an ε-root of must check a set of cubes that covers the entire ; the number of such cubes is at least .

However, there are classes of functions for which finding an approximate root is equivalent to finding an approximate fixed point. One example [18] is the class of functions such that maps to itself (that is: is in for all x in ). This is because, for every such function, the function satisfies the conditions of Brouwer's fixed-point theorem. X is a fixed-point of if and only if x is a root of , and x is an ε-residual fixed-point of if and only if x is an ε-root of . Chen and Deng [18] show that the discrete variants of these problems are computationally equivalent: both problems require function evaluations.

Communication complexity

Roughgarden and Weinstein [21] studied the communication complexity of computing an approximate fixed-point. In their model, there are two agents: one of them knows a function and the other knows a function . Both functions are Lipschitz continuous and satisfy Brouwer's conditions. The goal is to compute an approximate fixed point of the composite function . They show that the deterministic communication complexity is in .

Related Research Articles

In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input. Very frequently, one is solving the inverse problem: given one is solving for x, and thus the condition number of the (local) inverse must be used.

<span class="mw-page-title-main">Dirac delta function</span> Generalized function whose value is zero everywhere except at zero

In mathematical analysis, the Dirac delta function, also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Since there is no function having this property, modelling the delta "function" rigorously involves the use of limits or, as is common in mathematics, measure theory and the theory of distributions.

In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends.

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

<span class="mw-page-title-main">Squeeze theorem</span> Method for finding limits in calculus

In calculus, the squeeze theorem is a theorem regarding the limit of a function that is bounded between two other functions.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

In mathematics, the Dini and Dini–Lipschitz tests are highly precise tests that can be used to prove that the Fourier series of a function converges at a given point. These tests are named after Ulisse Dini and Rudolf Lipschitz.

For supervised learning applications in machine learning and statistical learning theory, generalization error is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Because learning algorithms are evaluated on finite samples, the evaluation of a learning algorithm may be sensitive to sampling error. As a result, measurements of prediction error on the current data may not provide much information about predictive ability on new data. Generalization error can be minimized by avoiding overfitting in the learning algorithm. The performance of a machine learning algorithm is visualized by plots that show values of estimates of the generalization error through the learning process, which are called learning curves.

In geometric topology, Busemann functions are used to study the large-scale geometry of geodesics in Hadamard spaces and in particular Hadamard manifolds. They are named after Herbert Busemann, who introduced them; he gave an extensive treatment of the topic in his 1955 book "The geometry of geodesics".

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

<span class="mw-page-title-main">Anatoly Karatsuba</span> Russian mathematician (1937–2008)

Anatoly Alexeyevich Karatsuba was a Russian mathematician working in the field of analytic number theory, p-adic numbers and Dirichlet series.

Property testing is a field of theoretical computer science, concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.

<span class="mw-page-title-main">Differential privacy</span> Methods of safely sharing general data

Differential privacy (DP) is a mathematically rigorous framework for releasing statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns of the group while limiting information that is leaked about specific individuals. This is done by injecting carefully calibrated noise into statistical computations such that the utility of the statistic is preserved while provably limiting what can be inferred about any individual in the dataset.

In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant.

In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.

In quantum information and computation, the Solovay–Kitaev theorem says that if a set of single-qubit quantum gates generates a dense subgroup of SU(2), then that set can be used to approximate any desired quantum gate with a short sequence of gates that can also be found efficiently. This theorem is considered one of the most significant results in the field of quantum computation and was first announced by Robert M. Solovay in 1995 and independently proven by Alexei Kitaev in 1997. Michael Nielsen and Christopher M. Dawson have noted its importance in the field.

<span class="mw-page-title-main">Stochastic gradient Langevin dynamics</span> Optimization and sampling technique

Stochastic gradient Langevin dynamics (SGLD) is an optimization and sampling technique composed of characteristics from Stochastic gradient descent, a Robbins–Monro optimization algorithm, and Langevin dynamics, a mathematical extension of molecular dynamics models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a differentiable objective function. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems.

References

  1. 1 2 The Computation of Fixed Points and Applications. Lecture Notes in Economics and Mathematical Systems. Vol. 124. 1976. doi:10.1007/978-3-642-50327-6. ISBN   978-3-540-07685-8.
  2. 1 2 3 Shellman, Spencer; Sikorski, K. (December 2003). "A recursive algorithm for the infinity-norm fixed point problem". Journal of Complexity. 19 (6): 799–834. doi: 10.1016/j.jco.2003.06.001 .
  3. 1 2 3 4 Hirsch, Michael D; Papadimitriou, Christos H; Vavasis, Stephen A (December 1989). "Exponential lower bounds for finding Brouwer fix points". Journal of Complexity. 5 (4): 379–416. doi:10.1016/0885-064X(89)90017-4. S2CID   1727254.
  4. 1 2 Sikorski, K; Woźniakowski, H (December 1987). "Complexity of fixed points, I". Journal of Complexity. 3 (4): 388–405. doi: 10.1016/0885-064X(87)90008-2 .
  5. Sikorski, Krzysztof A. (2001). Optimal Solution of Nonlinear Equations. Oxford University Press. ISBN   978-0-19-510690-9.[ page needed ]
  6. Sikorski, K. (1989). "Fast Algorithms for the Computation of Fixed Points". Robustness in Identification and Control. pp. 49–58. doi:10.1007/978-1-4615-9552-6_4. ISBN   978-1-4615-9554-0.
  7. Huang, Z; Khachiyan, L; Sikorski, K (June 1999). "Approximating Fixed Points of Weakly Contracting Mappings". Journal of Complexity. 15 (2): 200–213. doi: 10.1006/jcom.1999.0504 .
  8. Shellman, Spencer; Sikorski, K. (June 2002). "A Two-Dimensional Bisection Envelope Algorithm for Fixed Points". Journal of Complexity. 18 (2): 641–659. doi: 10.1006/jcom.2001.0625 .
  9. Shellman, Spencer; Sikorski, K. (September 2003). "Algorithm 825: A deep-cut bisection envelope algorithm for fixed points". ACM Transactions on Mathematical Software. 29 (3): 309–325. doi:10.1145/838250.838255. S2CID   7786886.
  10. Kellogg, R. B.; Li, T. Y.; Yorke, J. (September 1976). "A Constructive Proof of the Brouwer Fixed-Point Theorem and Computational Results". SIAM Journal on Numerical Analysis. 13 (4): 473–483. doi:10.1137/0713041.
  11. Smale, Steve (July 1976). "A convergent process of price adjustment and global newton methods". Journal of Mathematical Economics. 3 (2): 107–120. doi:10.1016/0304-4068(76)90019-7.
  12. Scarf, Herbert (September 1967). "The Approximation of Fixed Points of a Continuous Mapping". SIAM Journal on Applied Mathematics. 15 (5): 1328–1343. doi:10.1137/0115116.
  13. H. Scarf found the first algorithmic proof: Voitsekhovskii, M.I. (2001) [1994]. "Brouwer theorem". Encyclopedia of Mathematics . EMS Press. ISBN   1-4020-0609-8..
  14. Kuhn, Harold W. (1968). "Simplicial Approximation of Fixed Points". Proceedings of the National Academy of Sciences of the United States of America. 61 (4): 1238–1242. doi: 10.1073/pnas.61.4.1238 . JSTOR   58762. PMC   225246 . PMID   16591723.
  15. Merrill, Orin Harrison (1972). Applications and Extensions of an Algorithm that Computes Fixed Points of Certain Upper Semi-continuous Point to Set Mappings (Thesis). OCLC   570461463. NAID   10006142329.
  16. Eaves, B. Curtis (December 1972). "Homotopies for computation of fixed points". Mathematical Programming. 3–3 (1): 1–22. doi:10.1007/BF01584975. S2CID   39504380.
  17. Gale, David (1979). "The Game of Hex and Brouwer Fixed-Point Theorem". The American Mathematical Monthly. 86 (10): 818–827. doi:10.2307/2320146. JSTOR   2320146.
  18. 1 2 3 4 Chen, Xi; Deng, Xiaotie (2005). "On algorithms for discrete and approximate brouwer fixed points". Proceedings of the thirty-seventh annual ACM symposium on Theory of computing. pp. 323–330. doi:10.1145/1060590.1060638. ISBN   1581139608. S2CID   16942881.
  19. Chen, Xi; Deng, Xiaotie (October 2009). "On the complexity of 2D discrete fixed point problem". Theoretical Computer Science. 410 (44): 4448–4456. doi:10.1016/j.tcs.2009.07.052. S2CID   2831759.
  20. Sikorski, K. (June 1984). "Optimal solution of nonlinear equations satisfying a Lipschitz condition". Numerische Mathematik. 43 (2): 225–240. doi:10.1007/BF01390124. S2CID   120937024.
  21. Roughgarden, Tim; Weinstein, Omri (2016). "On the Communication Complexity of Approximate Fixed Points". 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). pp. 229–238. doi:10.1109/FOCS.2016.32. ISBN   978-1-5090-3933-3. S2CID   87553.

Further reading