Graph cuts in computer vision

Last updated

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision [1] ), such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph [2] (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph (e.g., normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms).

Contents

"Binary" problems (such as denoising a binary image) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a grayscale image) cannot be solved exactly, but solutions produced are usually near the global optimum.

History

The theory of graph cuts used as an optimization method was first applied in computer vision in the seminal paper by Greig, Porteous and Seheult [3] of Durham University. Allan Seheult and Bruce Porteous were members of Durham's lauded statistics group of the time, led by Julian Besag and Peter Green, with the optimisation expert Margaret Greig notable as the first ever female member of staff of the Durham Mathematical Sciences Department.

In the Bayesian statistical context of smoothing noisy (or corrupted) images, they showed how the maximum a posteriori estimate of a binary image can be obtained exactly by maximizing the flow through an associated image network, involving the introduction of a source and sink. The problem was therefore shown to be efficiently solvable. Prior to this result, approximate techniques such as simulated annealing (as proposed by the Geman brothers), [4] or iterated conditional modes (a type of greedy algorithm suggested by Julian Besag) [5] were used to solve such image smoothing problems.

Although the general -colour problem remains unsolved for the approach of Greig, Porteous and Seheult [3] has turned out [6] [7] to have wide applicability in general computer vision problems. Greig, Porteous and Seheult's approaches are often applied iteratively to a sequence of binary problems, usually yielding near optimal solutions.

In 2011, C. Couprie et al. [8] proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued indicator function from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent . When , the Power Watershed is optimized by graph cuts, when the Power Watershed is optimized by shortest paths, is optimized by the random walker algorithm and is optimized by the watershed algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms.

Binary segmentation of images

Notation

Existing methods

  1. First step optimizes over the color parameters using K-means.
  2. Second step performs the usual graph cuts algorithm.
These 2 steps are repeated recursively until convergence.

Energy function

where the energy is composed of two different models ( and ):

Likelihood / Color model / Regional term

— unary term describing the likelihood of each color.

  • This term can be modeled using different local (e.g. texons) or global (e.g. histograms, GMMs, Adaboost likelihood) approaches that are described below.
Histogram
  • We use intensities of pixels marked as seeds to get histograms for object (foreground) and background intensity distributions: P(I|O) and P(I|B).
  • Then, we use these histograms to set the regional penalties as negative log-likelihoods.
GMM (Gaussian mixture model)
  • We usually use two distributions: one for background modelling and another for foreground pixels.
  • Use a Gaussian mixture model (with 5–8 components) to model those 2 distributions.
  • Goal: Try to pull apart those two distributions.
Texon
  • A texon (or texton) is a set of pixels that has certain characteristics and is repeated in an image.
  • Steps:
  1. Determine a good natural scale for the texture elements.
  2. Compute non-parametric statistics of the model-interior texons, either on intensity or on Gabor filter responses.

Prior / Coherence model / Boundary term

— binary term describing the coherence between neighborhood pixels.

  • In practice, pixels are defined as neighbors if they are adjacent either horizontally, vertically or diagonally (4 way connectivity or 8 way connectivity for 2D images).
  • Costs can be based on local intensity gradient, Laplacian zero-crossing, gradient direction, color mixture model,...
  • Different energy functions have been defined:
    • Standard Markov random field: Associate a penalty to disagreeing pixels by evaluating the difference between their segmentation label (crude measure of the length of the boundaries). See Boykov and Kolmogorov ICCV 2003
    • Conditional random field: If the color is very different, it might be a good place to put a boundary. See Lafferty et al. 2001; Kumar and Hebert 2003

Criticism

Graph cuts methods have become popular alternatives to the level set-based approaches for optimizing the location of a contour (see [9] for an extensive comparison). However, graph cut approaches have been criticized in the literature for several issues:

Algorithm

Implementation (exact)

The Boykov-Kolmogorov algorithm [17] is an efficient way to compute the max-flow for computer vision-related graphs.

Implementation (approximation)

The Sim Cut algorithm [18] approximates the minimum graph cut. The algorithm implements a solution by simulation of an electrical network. This is the approach suggested by Cederbaum's maximum flow theorem. [19] [20] Acceleration of the algorithm is possible through parallel computing.

Software

Related Research Articles

In computer science and optimization theory, the max-flow min-cut theorem states that in a flow network, the maximum amount of flow passing from the source to the sink is equal to the total weight of the edges in a minimum cut, i.e., the smallest total weight of the edges which if removed would disconnect the source from the sink.

<span class="mw-page-title-main">Maximum flow problem</span> Computational problem in graph theory

In optimization theory, maximum flow problems involve finding a feasible flow through a flow network that obtains the maximum possible flow rate.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases cuts are identified with their cut-sets rather than with their vertex partitions.

<span class="mw-page-title-main">Minimum cut</span> Partition of a graph by removing fewest possible edges

In graph theory, a minimum cut or min-cut of a graph is a cut that is minimal in some metric.

<span class="mw-page-title-main">Watershed (image processing)</span>

In the study of image processing, a watershed is a transformation defined on a grayscale image. The name refers metaphorically to a geological watershed, or drainage divide, which separates adjacent drainage basins. The watershed transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges.

<span class="mw-page-title-main">Active contour model</span>

Active contour model, also called snakes, is a framework in computer vision introduced by Michael Kass, Andrew Witkin, and Demetri Terzopoulos for delineating an object outline from a possibly noisy 2D image. The snakes model is popular in computer vision, and snakes are widely used in applications like object tracking, shape recognition, segmentation, edge detection and stereo matching.

<span class="mw-page-title-main">Bundle adjustment</span>

In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s) employed to acquire the images, given a set of images depicting a number of 3D points from different viewpoints. Its name refers to the geometrical bundles of light rays originating from each 3D feature and converging on each camera's optical center, which are adjusted optimally according to an optimality criterion involving the corresponding image projections of all points.

In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying graph partitioning via minimum cut or maximum cut. Segmentation-based object categorization can be viewed as a specific case of spectral clustering applied to image segmentation.

<span class="mw-page-title-main">Maximum cut</span> Problem of finding a maximum cut in a graph

For a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.

GrabCut is an image segmentation method based on graph cuts.

In mathematics, the minimum k-cut is a combinatorial optimization problem that requires finding a set of edges whose removal would partition the graph to at least k connected components. These edges are referred to as k-cut. The goal is to find the minimum-weight k-cut. This partitioning can have applications in VLSI design, data-mining, finite elements and communication in parallel computing.

The random walker algorithm is an algorithm for image segmentation. In the first description of the algorithm, a user interactively labels a small number of pixels with known labels, e.g., "object" and "background". The unlabeled pixels are each imagined to release a random walker, and the probability is computed that each pixel's random walker first arrives at a seed bearing each label, i.e., if a user places K seeds, each with a different label, then it is necessary to compute, for each pixel, the probability that a random walker leaving the pixel will first arrive at each seed. These probabilities may be determined analytically by solving a system of linear equations. After computing these probabilities for each pixel, the pixel is assigned to the label for which it is most likely to send a random walker. The image is modeled as a graph, in which each pixel corresponds to a node which is connected to neighboring pixels by edges, and the edges are weighted to reflect the similarity between the pixels. Therefore, the random walk occurs on the weighted graph.

Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function , if it is possible to construct a flow network with positive weights such that

Quadratic pseudo-Boolean optimisation (QPBO) is a combinatorial optimization method for quadratic pseudo-Boolean functions in the form

In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization. Dual decomposition is applied to markov logic programs as an inference technique.

Semi-global matching (SGM) is a computer vision algorithm for the estimation of a dense disparity map from a rectified stereo image pair, introduced in 2005 by Heiko Hirschmüller while working at the German Aerospace Center. Given its predictable run time, its favourable trade-off between quality of the results and computing time, and its suitability for fast parallel implementation in ASIC or FPGA, it has encountered wide adoption in real-time stereo vision applications such as robotics and advanced driver assistance systems.

Video matting is a technique for separating the video into two or more layers, usually foreground and background, and generating alpha mattes which determine blending of the layers. The technique is very popular in video editing because it allows to substitute the background, or process the layers individually.

Graphical time warping (GTW) is a framework for jointly aligning multiple pairs of time series or sequences. GTW considers both the alignment accuracy of each sequence pair and the similarity among pairs. On contrary, alignment with dynamic time warping (DTW) considers the pairs independently and minimizes only the distance between the two sequences in a given pair. Therefore, GTW generalizes DTW and could achieve a better alignment performance when similarity among pairs is expected.

References

  1. Adelson, Edward H., and James R. Bergen (1991), "The plenoptic function and the elements of early vision", Computational models of visual processing 1.2 (1991).
  2. Boykov, Y., Veksler, O., and Zabih, R. (2001), "Fast approximate energy minimization via graph cuts," IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11): 1222-1239.
  3. 1 2 D.M. Greig, B.T. Porteous and A.H. Seheult (1989), Exact maximum a posteriori estimation for binary images , Journal of the Royal Statistical Society, Series B, 51, 271–279.
  4. D. Geman and S. Geman (1984), Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images , IEEE Trans. Pattern Anal. Mach. Intell., 6, 721–741.
  5. J.E. Besag (1986), On the statistical analysis of dirty pictures (with discussion), Journal of the Royal Statistical Society Series B, 48, 259–302
  6. Y. Boykov, O. Veksler and R. Zabih (1998), "Markov Random Fields with Efficient Approximations", International Conference on Computer Vision and Pattern Recognition (CVPR).
  7. 1 2 Y. Boykov, O. Veksler and R. Zabih (2001), "Fast approximate energy minimisation via graph cuts", IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1222–1239.
  8. Camille Couprie, Leo Grady, Laurent Najman and Hugues Talbot, "Power Watersheds: A Unifying Graph-Based Optimization Framework”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 7, pp. 1384-1399, July 2011
  9. Leo Grady and Christopher Alvino (2009), "The Piecewise Smooth Mumford-Shah Functional on an Arbitrary Graph", IEEE Trans. on Image Processing, pp. 2547–2561
  10. Yuri Boykov and Vladimir Kolmogorov (2003), "Computing Geodesics and Minimal Surfaces via Graph Cuts", Proc. of ICCV
  11. Ben Appleton and Hugues Talbot (2006), "Globally Minimal Surfaces by Continuous Maximal Flows", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 106–118
  12. Ali Kemal Sinop and Leo Grady, "A Seeded Image Segmentation Framework Unifying Graph Cuts and Random Walker Which Yields A New Algorithm", Proc. of ICCV, 2007
  13. Vladimir Kolmogorov and Yuri Boykov (2005), "What Metrics Can Be Approximated by Geo-Cuts, or Global Optimization of Length/Area and Flux", Proc. of ICCV pp. 564–571
  14. Nicolas Lermé, François Malgouyres and Lucas Létocart (2010), "Reducing graphs in graph cut segmentation Archived 2012-03-27 at the Wayback Machine ", Proc. of ICIP, pp. 3045–3048
  15. Herve Lombaert, Yiyong Sun, Leo Grady, Chenyang Xu (2005), "A Multilevel Banded Graph Cuts Method for Fast Image Segmentation", Proc. of ICCV, pp. 259–265
  16. Yin Li, Jian Sun, Chi-Keung Tang, and Heung-Yeung Shum (2004), "Lazy Snapping", ACM Transactions on Graphics, pp. 303–308
  17. Yuri Boykov, Vladimir Kolmogorov: An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9): 1124–1137 (2004)
  18. P.J. Yim: "Method and System for Image Segmentation," United States Patent US8929636, January 6, 2016
  19. Cederbaum, I. (1962-08-01). "On optimal operation of communication nets". Journal of the Franklin Institute. 274 (2): 130–141. doi:10.1016/0016-0032(62)90401-5. ISSN   0016-0032.
  20. I.T. Frisch, "On Electrical analogs for flow networks," Proceedings of IEEE, 57:2, pp. 209-210, 1969