Graph cut optimization

Last updated

Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function , if it is possible to construct a flow network with positive weights such that

Contents

then it is possible to find the global optimum of in polynomial time by computing a minimum cut of the graph. The mapping between cuts and variable assignments is done by representing each variable with one node in the graph and, given a cut, each variable will have a value of 0 if the corresponding node belongs to the component connected to the source, or 1 if it belong to the component connected to the sink.

Not all pseudo-Boolean functions can be represented by a flow network, and in the general case the global optimization problem is NP-hard. There exist sufficient conditions to characterise families of functions that can be optimised through graph cuts, such as submodular quadratic functions. Graph cut optimization can be extended to functions of discrete variables with a finite number of values, that can be approached with iterative algorithms with strong optimality properties, computing one graph cut at each iteration.

Graph cut optimization is an important tool for inference over graphical models such as Markov random fields or conditional random fields, and it has applications in computer vision problems such as image segmentation, [1] [2] denoising, [3] registration [4] [5] and stereo matching. [6] [7]

Representability

A pseudo-Boolean function is said to be representable if there exists a graph with non-negative weights and with source and sink nodes and respectively, and there exists a set of nodes such that, for each tuple of values assigned to the variables, equals (up to a constant) the value of the flow determined by a minimum cut of the graph such that if and if . [8]

It is possible to classify pseudo-Boolean functions according to their order, determined by the maximum number of variables contributing to each single term. All first order functions, where each term depends upon at most one variable, are always representable. Quadratic functions

are representable if and only if they are submodular, i.e. for each quadratic term the following condition is satisfied

Cubic functions

are representable if and only if they are regular, i.e. all possible binary projections to two variables, obtained by fixing the value of the remaining variable, are submodular. For higher-order functions, regularity is a necessary condition for representability. [8]

Graph construction

Graph construction for a representable function is simplified by the fact that the sum of two representable functions and is representable, and its graph is the union of the graphs and representing the two functions. Such theorem allows to build separate graphs representing each term and combine them to obtain a graph representing the entire function. [8]

The graph representing a quadratic function of variables contains vertices, two of them representing the source and sink and the others representing the variables. When representing higher-order functions, the graph contains auxiliary nodes that allow to model higher-order interactions.

Unary terms

A unary term depends only on one variable and can be represented by a graph with one non-terminal node and one edge with weight if , or with weight if . [8]

Binary terms

Example of a graph representing a quadratic term
w
i
j
(
x
i
,
x
j
)
{\displaystyle w_{ij}(x_{i},x_{j})}
in case
w
i
j
(
1
,
0
)
-
w
i
j
(
0
,
0
)
>
0
{\displaystyle w_{ij}(1,0)-w_{ij}(0,0)>0}
and
w
i
j
(
1
,
1
)
-
w
i
j
(
1
,
0
)
<
0
{\displaystyle w_{ij}(1,1)-w_{ij}(1,0)<0}
. Graph cut binary.svg
Example of a graph representing a quadratic term in case and .

A quadratic (or binary) term can be represented by a graph containing two non-terminal nodes and . The term can be rewritten as

with

In this expression, the first term is constant and it is not represented by any edge, the two following terms depend on one variable and are represented by one edge, as shown in the previous section for unary terms, while the third term is represented by an edge with weight (submodularity guarantees that the weight is non-negative). [8]

Ternary terms

A cubic (or ternary) term can be represented by a graph with four non-terminal nodes, three of them (, and ) associated to the three variables plus one fourth auxiliary node . [note 1] A generic ternary term can be rewritten as the sum of a constant, three unary terms, three binary terms, and a ternary term in simplified form. There may be two different cases, according to the sign of . If then

Example of a graph representing the ternary term
p
x
i
x
j
x
k
{\displaystyle px_{i}x_{j}x_{k}}
when
p
>
0
{\displaystyle p>0}
(left) and when
p
<
0
{\displaystyle p<0}
(right). Graph cut ternary.svg
Example of a graph representing the ternary term when (left) and when (right).

with

If the construction is similarly, but the variables will have opposite value. If the function is regular, then all its projections of two variables will be submodular, implying that , and are positive and then all terms in the new representation are submodular.

In this decomposition, the constant, unary and binary terms can be represented as shown in the previous sections. If the ternary term can be represented with a graph with four edges , , , , all with weight , while if the term can be represented by four edges , , , with weight . [8]

Minimum cut

After building a graph representing a pseudo-Boolean function, it is possible to compute a minimum cut using one among the various algorithms developed for flow networks, such as Ford–Fulkerson, Edmonds–Karp, and Boykov–Kolmogorov algorithm. The result is a partition of the graph in two connected components and such that and , and the function attains its global minimum when for each such that the corresponding node , and for each such that the corresponding node .

Max-flow algorithms such as BoykovKolmogorov's are very efficient in practice for sequential computation, but they are difficult to parallelise, making them not suitable for distributed computing applications and preventing them from exploiting the potential of modern CPUs. Parallel max-flow algorithms were developed, such as push-relabel [9] and jump-flood, [1] that can also take advantage of hardware acceleration in GPGPU implementations. [10] [1] [11]

Functions of discrete variables with more than two values

The previous construction allows global optimization of pseudo-Boolean functions only, but it can be extended to quadratic functions of discrete variables with a finite number of values, in the form

where and . The function represents the unary contribution of each variable (often referred as data term), while the function represents binary interactions between variables (smoothness term). In the general case, optimization of such functions is a NP-hard problem, and stochastic optimization methods such as simulated annealing are sensitive to local minima and in practice they can generate arbitrarily sub-optimal results. [note 2] With graph cuts it is possible to construct move-making algorithms that allow to reach in polynomial time a local minima with strong optimality properties for a wide family of quadratic functions of practical interest (when the binary interaction is a metric or a semimetric), such that the value of the function in the solution lies within a constant and known factor from the global optimum. [12]

Given a function with , and a certain assignment of values to the variables, it is possible to associate each assignment to a partition of the set of variables, such that, . Give two distinct assignments and and a value , a move that transforms into is said to be an -expansion if and . Given a couple of values and , a move is said to be an -swap if . Intuitively, an -expansion move from assigns the value of to some variables that have a different value in , while an -swap move assigns to some variables that have value in and vice versa.

For each iteration, the -expansion algorithm computes, for each possible value , the minimum of the function among all assignments that can be reached with a single -expansion move from the current temporary solution , and takes it as the new temporary solution.

while:     foreach:         if:             

The -swap algorithm is similar, but it searches for the minimum among all assignments reachable with a single -swap move from .

while:     foreach:         if:             

In both cases, the optimization problem in the innermost loop can be solved exactly and efficiently with a graph cut. Both algorithms terminate certainly in a finite number of iterations of the outer loop, and in practice such number is small, with most of the improvement happening at the first iteration. The algorithms can generate different solutions depending on the initial guess, but in practice they are robust with respect to initialisation, and starting with a point where all variables are assigned to the same random value is usually sufficient to produce good quality results. [12]

The solution generated by such algorithms is not necessarily a global optimum, but it has strong guarantees of optimality. If is a metric and is a solution generated by the -expansion algorithm, or if is a semimetric and is a solution generated by the -swap algorithm, then lies within a known and constant factor from the global minimum : [12]

Non-submodular functions

Generally speaking, the problem of optimizing a non-submodular pseudo-Boolean function is NP-hard and cannot be solved in polynomial time with a simple graph cut. The simplest approach is to approximate the function with a similar but submodular one, for instance truncating all non-submodular terms or replacing them with similar submodular expressions. Such approach is generally sub-optimal, and it produces acceptable results only if the number of non-submodular terms is relatively small. [13]

In case of quadratic non-submodular functions, it is possible to compute in polynomial time a partial solution using algorithms such as QPBO. [13] Higher-order functions can be reduced in polynomial time to a quadratic form that can be optimised with QPBO. [14]

Higher-order functions

Quadratic functions are extensively studied and were characterised in detail, but more general results were derived also for higher-order functions. While quadratic functions can indeed model many problems of practical interest, they are limited by the fact they can represent only binary interactions between variables. The possibility to capture higher-order interactions allows to better capture the nature of the problem and it can provide higher quality results that could be difficult to achieve with quadratic models. For instance in computer vision applications, where each variable represents a pixel or voxel of the image, higher-order interactions can be used to model texture information, that would be difficult to capture using only quadratic functions. [15]

Sufficient conditions analogous to submodularity were developed to characterise higher-order pseudo-Boolean functions that can be optimised in polynomial time, [16] and there exists algorithms analogous to -expansion and -swap for some families of higher-order functions. [15] The problem is NP-hard in the general case, and approximate methods were developed for fast optimization of functions that do not satisfy such conditions. [16] [17]

Notes

  1. Adding one node is necessary, graphs without auxiliary nodes can only represent binary interactions between variables.
  2. Algorithms such as simulated annealing have strong theoretical convergence properties for some scheduling of the temperature to infinity. Such scheduling cannot be realised in practice.

Related Research Articles

In mathematics, the covariant derivative is a way of specifying a derivative along tangent vectors of a manifold. Alternatively, the covariant derivative is a way of introducing and working with a connection on a manifold by means of a differential operator, to be contrasted with the approach given by a principal connection on the frame bundle – see affine connection. In the special case of a manifold isometrically embedded into a higher-dimensional Euclidean space, the covariant derivative can be viewed as the orthogonal projection of the Euclidean directional derivative onto the manifold's tangent space. In this case the Euclidean derivative is broken into two parts, the extrinsic normal component and the intrinsic covariant derivative component.

System F is a typed lambda calculus that introduces, to simply typed lambda calculus, a mechanism of universal quantification over types. System F formalizes parametric polymorphism in programming languages, thus forming a theoretical basis for languages such as Haskell and ML. It was discovered independently by logician Jean-Yves Girard (1972) and computer scientist John C. Reynolds.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

In linear algebra, an eigenvector or characteristic vector is a vector that has its direction unchanged by a given linear transformation. More precisely, an eigenvector of a linear transformation is scaled by a constant factor when the linear transformation is applied to it: . It is often important to know these vectors in linear algebra. The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .

In queueing theory, a discipline within the mathematical theory of probability, a Jackson network is a class of queueing network where the equilibrium distribution is particularly simple to compute as the network has a product-form solution. It was the first significant development in the theory of networks of queues, and generalising and applying the ideas of the theorem to search for similar product-form solutions in other networks has been the subject of much research, including ideas used in the development of the Internet. The networks were first identified by James R. Jackson and his paper was re-printed in the journal Management Science’s ‘Ten Most Influential Titles of Management Sciences First Fifty Years.’

A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distributions. It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occurs may itself be a stochastic process. The distribution can be represented by a random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of the Markov process represents one of the phases.

In computational chemistry, a constraint algorithm is a method for satisfying the Newtonian motion of a rigid body which consists of mass points. A restraint algorithm is used to ensure that the distance between mass points is maintained. The general steps involved are: (i) choose novel unconstrained coordinates, (ii) introduce explicit constraint forces, (iii) minimize constraint forces implicitly by the technique of Lagrange multipliers or projection methods.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().

The derivatives of scalars, vectors, and second-order tensors with respect to second-order tensors are of considerable use in continuum mechanics. These derivatives are used in the theories of nonlinear elasticity and plasticity, particularly in the design of algorithms for numerical simulations.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

A Hindley–Milner (HM) type system is a classical type system for the lambda calculus with parametric polymorphism. It is also known as Damas–Milner or Damas–Hindley–Milner. It was first described by J. Roger Hindley and later rediscovered by Robin Milner. Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis.

In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

Quadratic pseudo-Boolean optimisation (QPBO) is a combinatorial optimization method for minimizing quadratic pseudo-Boolean functions in the form

Tau functions are an important ingredient in the modern mathematical theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form.

The quaternion estimator algorithm (QUEST) is an algorithm designed to solve Wahba's problem, that consists of finding a rotation matrix between two coordinate systems from two sets of observations sampled in each system respectively. The key idea behind the algorithm is to find an expression of the loss function for the Wahba's problem as a quadratic form, using the Cayley–Hamilton theorem and the Newton–Raphson method to efficiently solve the eigenvalue problem and construct a numerically stable representation of the solution.

References

  1. 1 2 3 Peng et al. (2015).
  2. Rother et al. (2012).
  3. Lombaert and Cheriet (2012).
  4. So et al. (2011).
  5. Tang and Chung (2007).
  6. Kim et al. (2003).
  7. Hong and Chen (2004).
  8. 1 2 3 4 5 6 Kolmogorov and Zabin (2004).
  9. Goldberg & Tarjan (1988).
  10. Vineet and Narayanan (2008).
  11. Stich (2009).
  12. 1 2 3 Boykov et al. (2001).
  13. 1 2 Kolmogorov and Rother (2007).
  14. Ishikawa (2014).
  15. 1 2 Kohli et al. (2009).
  16. 1 2 Freedman & Drineas (2005).
  17. Kohli et al. (2008).

Bibliography