Quadratic pseudo-Boolean optimization

Last updated

Quadratic pseudo-Boolean optimisation (QPBO) is a combinatorial optimization method for quadratic pseudo-Boolean functions in the form

Contents

in the binary variables , with . If is submodular then QPBO produces a global optimum equivalently to graph cut optimization, while if contains non-submodular terms then the algorithm produces a partial solution with specific optimality properties, in both cases in polynomial time. [1]

QPBO is a useful tool for inference on Markov random fields and conditional random fields, and has applications in computer vision problems such as image segmentation and stereo matching. [2]

Optimization of non-submodular functions

If the coefficients of the quadratic terms satisfy the submodularity condition

then the function can be efficiently optimised with graph cut optimization. It is indeed possible to represent it with a non-negative weighted graph, and the global minimum can be found in polynomial time by computing a minimum cut of the graph, which can be computed with algorithms such as Ford–Fulkerson, Edmonds–Karp, and Boykov–Kolmogorov's.

If the function is not submodular, then the problem is NP-hard in the general case and it is not always possible to solve it exactly in polynomial time. It is possible to replace the target function with a similar but submodular approximation, e.g. by removing all non-submodular terms or replacing them with submodular approximations, but such approach is generally sub-optimal and it produces satisfying results only if the number of non-submodular terms is relatively small. [1]

QPBO builds an extended graph, introducing a set of auxiliary variables ideally equivalent to the negation of the variables in the problem. If the nodes in the graph associated to a variable (representing the variable itself and its negation) are separated by the minimum cut of the graph in two different connected components, then the optimal value for such variable is well defined, otherwise it is not possible to infer it. Such method produces results generally superior to submodular approximations of the target function. [1]

Properties

QPBO produces a solution where each variable assumes one of three possible values: true, false, and undefined, noted in the following as 1, 0, and respectively. The solution has the following two properties.

Algorithm

Graph representing a function of two variables
x
p
{\displaystyle x_{p}}
and
x
q
{\displaystyle x_{q}}
. Qpbo.svg
Graph representing a function of two variables and .

The algorithm can be divided in three steps: graph construction, max-flow computation, and assignment of values to the variables.

When constructing the graph, the set of vertices contains the source and sink nodes and , and a pair of nodes and for each variable. After re-parametrising the function to normal form, [note 1] a pair of edges is added to the graph for each term :

The minimum cut of the graph can be computed with a max-flow algorithm. In the general case, the minimum cut is not unique, and each minimum cut correspond to a different partial solution, however it is possible to build a minimum cut such that the number of undefined variables is minimal.

Once the minimum cut is known, each variable receives a value depending upon the position of its corresponding nodes and : if belongs to the connected component containing the source and belongs to the connected component containing the sink then the variable will have value of 0. Vice versa, if belongs to the connected component containing the sink and to the one containing the source, then the variable will have value of 1. If both nodes and belong to the same connected component, then the value of the variable will be undefined. [2]

The way undefined variables can be handled is dependent upon the context of the problem. In the general case, given a partition of the graph in two sub-graphs and two solutions, each one optimal for one of the sub-graphs, then it is possible to combine the two solutions into one solution optimal for the whole graph in polynomial time. [3] However, computing an optimal solution for the subset of undefined variables is still a NP-hard problem. In the context of iterative algorithms such as -expansion, a reasonable approach is to leave the value of undefined variables unchanged, since the persistence property guarantees that the target function will have non-increasing value. [1] Different exact and approximate strategies to minimise the number of undefined variables exist. [2]

Higher order terms

It is always possible to reduce a higher-order function to a quadratic function which is equivalent with respect to the optimisation, problem known as "higher-order clique reduction" (HOCR), and the result of such reduction can be optimized with QPBO. Generic methods for reduction of arbitrary functions rely on specific substitution rules and in the general case they require the introduction of auxiliary variables. [4] In practice most terms can be reduced without introducing additional variables, resulting in a simpler optimization problem, and the remaining terms can be reduced exactly, with addition of auxiliary variables, or approximately, without addition of any new variable. [5]

Notes

  1. 1 2 3 4 5 Kolmogorov and Rother (2007).
  2. 1 2 3 Rother et al. (2007).
  3. Billionnet and Jaumard (1989).
  4. Fix et al. (2011).
  5. Ishikawa (2014).

Related Research Articles

<span class="mw-page-title-main">Linear programming</span> Method to solve some optimization problems

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements are represented by linear relationships. Linear programming is a special case of mathematical programming.

In computer science and optimization theory, the max-flow min-cut theorem states that in a flow network, the maximum amount of flow passing from the source to the sink is equal to the total weight of the edges in a minimum cut, i.e., the smallest total weight of the edges which if removed would disconnect the source from the sink.

<span class="mw-page-title-main">Dynamic programming</span> Problem optimization method

Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.

<span class="mw-page-title-main">Optimal control</span> Mathematical way of attaining a desired output from a dynamic system

Optimal control theory is a branch of mathematical optimization that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the moon with minimum fuel expenditure. Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory.

<span class="mw-page-title-main">Legendre transformation</span>

In mathematics, the Legendre transformation, named after Adrien-Marie Legendre, is an involutive transformation on real-valued convex functions of one real variable. In physical problems, it is used to convert functions of one quantity into functions of the conjugate quantity. In this way, it is commonly used in classical mechanics to derive the Hamiltonian formalism out of the Lagrangian formalism and in thermodynamics to derive the thermodynamic potentials, as well as in the solution of differential equations of several variables.

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

<span class="mw-page-title-main">Critical point (mathematics)</span> Point where the derivative of a function is zero

Critical point is a wide term used in many branches of mathematics.

In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests for a solution in nonlinear programming to be optimal, provided that some regularity conditions are satisfied.

<span class="mw-page-title-main">Conditional random field</span> Class of statistical modeling methods

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

In computational complexity theory, Polynomial Local Search (PLS) is a complexity class that models the difficulty of finding a locally optimal solution to an optimization problem. The main characteristics of problems that lie in PLS are that the cost of a solution can be calculated in polynomial time and the neighborhood of a solution can be searched in polynomial time. Therefore it is possible to verify whether or not a solution is a local optimum in polynomial time. Furthermore, depending on the problem and the algorithm that is used for solving the problem, it might be faster to find a local optimum instead of a global optimum.

The Hamiltonian is a function used to solve a problem of optimal control for a dynamical system. It can be understood as an instantaneous increment of the Lagrangian expression of the problem that is to be optimized over a certain time period. Inspired by, but distinct from, the Hamiltonian of classical mechanics, the Hamiltonian of optimal control theory was developed by Lev Pontryagin as part of his maximum principle. Pontryagin proved that a necessary condition for solving the optimal control problem is that the control should be chosen so as to optimize the Hamiltonian.

In mathematics and economics, transportation theory or transport theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box-Cox transformed regressors.

In mathematics and optimization, a pseudo-Boolean function is a function of the form

In mathematics, a submodular set function is a set function whose value, informally, has the property that the difference in the incremental value of the function that a single element makes when added to an input set decreases as the size of the input set increases. Submodular functions have a natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found immense utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

In real analysis and approximation theory, the Arnold–Kolmogorov representation theorem states that every multivariate continuous function can be represented as a superposition of the two-argument addition and continuous functions of one variable. It solved a more constrained, yet more general form of Hilbert's thirteenth problem.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Structured sparsity regularization is a class of methods, and an area of research in statistical learning theory, that extend and generalize sparsity regularization learning methods. Both sparsity and structured sparsity regularization methods seek to exploit the assumption that the output variable to be learned can be described by a reduced number of variables in the input space . Sparsity regularization methods focus on selecting the input variables that best describe the output. Structured sparsity regularization methods generalize and extend sparsity regularization methods, by allowing for optimal selection over structures like groups or networks of input variables in .

Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function , if it is possible to construct a flow network with positive weights such that

References

Notes

  1. The representation of a pseudo-Boolean function with coefficients is not unique, and if two coefficient vectors and represent the same function then is said to be a reparametrisation of and vice versa. In some constructions it is useful to ensure that the function has a specific form, called normal form, which is always defined for any function, and it is not unique. A function is in normal form if the two following conditions hold (Kolmogorov and Rother (2007)):
    1. for each ;
    2. for each and for each .
    Given an arbitrary function , it is always possible to find a reparametrisation to normal form with the following algorithm in two steps (Kolmogorov and Rother (2007)):
    1. as long as there exist indices and such that the second condition of normality is not satisfied, substitute:
      • with
      • with
      • with
      where ;
    2. for , substitute:
      • with
      • with
      • with
      where .