Graph cuts in computer vision

Last updated

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision [1] ), such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization.

Contents

Graph cut techniques are now increasingly being used in combination with more general spatial Artificial intelligence techniques (eg to enforce structure in Large language model output to sharpen tumour boundaries and similarly for various Self-driving car, Robotics, Google Maps applications etc).

Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph [2] (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution.

Although many computer vision algorithms involve cutting a graph (e.g. normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms).

"Binary" problems (such as denoising a binary image) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a grayscale image) cannot be solved exactly, but solutions produced are usually near the global optimum.

History

The foundational theory of graph cuts was first applied in computer vision in a legendary 1989 paper by Margaret Greig, Bruce Porteous and Allan Seheult [3] (GPS) of Durham University. In the Bayesian statistical context of smoothing noisy (or corrupted) binary images, using a Markov random field as the image prior distribution to ensure spatial consistency, they showed with a deceptively simple argument how the maximum a posteriori estimate of a binary image can be obtained exactly: by maximizing the flow through an associated image network, involving the introduction of a source and sink. The problem was therefore shown to be efficiently solvable exactly, an unexpected result due the vast size of the problem which was believed at the time to be computationally intractable (NP hard).

Prior to this result, approximate local optimisation techniques such as simulated annealing (as proposed by the Geman brothers), [4] or iterated conditional modes (a type of greedy algorithm suggested by Julian Besag) [5] were used to solve such image smoothing problems. The GPS paper combined ideas from Mathematical statistics (Bayes' theorem), Physics (Ising model), Optimisation (Energy function) and Computer science (Network flow problem) and led the move away from approximate local optimisation approaches (eg simulated annealing) to more powerful exact, or near exact, global optimisation techniques.

GPS also addressed the computational cost of the max-flow algorithm on large grids—a significant concern at the time. They proposed a partitioning algorithm (see Section 4 of GPS) involving the recursive amalgamation of non-overlapping blocks which gave a 12X increase in speed at the time. This approach recursively solved and amalgamated independent sub-graphs until the whole graph was solved. While contemporaries like Geman and Geman [4] had advocated parallel processing in the context of Simulated annealing, the GPS blocking strategy offered a deterministic structure amenable to parallelisation and anticipated modern artificial intelligence design across multiple GPUs. However, this aspect of the paper was ignored and later research focused on global search trees, such as the Boykov-Kolmogorov [6] algorithm.

Although GPS is now recognised as seminal, it was well ahead of its time and, in particular, was published years before the computing power revolution of Moore's law and GPUs. Significantly, GPS was published in a mathematical statistics, rather than a computer vision, journal and this lead to it being overlooked by the computer vision community for many years. It is unofficially known as "The Velvet Underground" paper of computer vision (ie although very few computer vision people read the paper [bought the record], those that did started new research [formed a band]).

Although the general -colour problem is NP hard for the GPS approach has turned out to have wide applicability in general computer vision problems. This was first demonstrated by Boykov, Veksler and Zabih [2] who, in a seminal paper published more than 10 years after the original GPS paper, and other important papers, lit the blue touch paper for the general adoption of graph cut techniques in computer vision. They showed that, for general problems, the GPS approach can be applied iteratively to sequences of binary problems to yield near optimal solutions.

Despite the foundational nature of the GPS work, formal recognition from the computer vision community has predominantly gone to the researchers who followed them to extend and popularise the graph cut method. For example, Boykov, Veksler and Zabih [2] deservedly received a Helmholtz Prize from the ICCV in 2011 [7] . This prize recognises ICCV papers from 10 or more years earlier that have had a significant impact on computer vision research.

In 2011, Couprie et al. [8] proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued indicator function from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent . When , the Power Watershed is optimized by graph cuts, when the Power Watershed is optimized by shortest paths, is optimized by the random walker algorithm and is optimized by the watershed algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms.

Binary segmentation of images

Notation

Existing methods

  1. First step optimizes over the color parameters using K-means.
  2. Second step performs the usual graph cuts algorithm.
These 2 steps are repeated recursively until convergence

Energy function

where the energy is composed of two different models ( and ):

Likelihood / Color model / Regional term

— unary term describing the likelihood of each color.

  • This term can be modeled using different local (e.g. texons) or global (e.g. histograms, GMMs, Adaboost likelihood) approaches that are described below.
Histogram
  • We use intensities of pixels marked as seeds to get histograms for object (foreground) and background intensity distributions: P(I|O) and P(I|B).
  • Then, we use these histograms to set the regional penalties as negative log-likelihoods.
GMM (Gaussian mixture model)
  • We usually use two distributions: one for background modelling and another for foreground pixels.
  • Use a Gaussian mixture model (with 5–8 components) to model those 2 distributions.
  • Goal: Try to pull apart those two distributions.
Texon
  • A texon (or texton) is a set of pixels that has certain characteristics and is repeated in an image.
  • Steps:
  1. Determine a good natural scale for the texture elements.
  2. Compute non-parametric statistics of the model-interior texons, either on intensity or on Gabor filter responses.

Prior / Coherence model / Boundary term

— binary term describing the coherence between neighborhood pixels.

  • In practice, pixels are defined as neighbors if they are adjacent either horizontally, vertically or diagonally (4 way connectivity or 8 way connectivity for 2D images).
  • Costs can be based on local intensity gradient, Laplacian zero-crossing, gradient direction, color mixture model,...
  • Different energy functions have been defined:
    • Standard Markov random field: Associate a penalty to disagreeing pixels by evaluating the difference between their segmentation label (crude measure of the length of the boundaries). See Boykov and Kolmogorov ICCV 2003
    • Conditional random field: If the color is very different, it might be a good place to put a boundary. See Lafferty et al. 2001; Kumar and Hebert 2003

Criticism

Graph cuts methods have become popular alternatives to the level set-based approaches for optimizing the location of a contour (see [9] for an extensive comparison). However, graph cut approaches have been criticized in the literature for several issues:

Limitations and modern usage

While graph cuts provide mathematically optimal solutions for specific energy functions, their use as a standalone method for general object recognition declined with the advent of deep learning. The primary limitations include:

However, they remain a standard tool for interactive segmentation (e.g., rotoscoping in visual effects), where a user provides the semantic intent (via scribbles) and the algorithm handles the boundary precision. As described below, they are increasingly being integrated into modern artificial intelligence.

Algorithm

Implementation (exact)

The Boykov-Kolmogorov algorithm [6] is an efficient way to compute the max-flow for computer vision-related graphs.

Implementation (approximation)

The Sim Cut algorithm [17] approximates the minimum graph cut. The algorithm implements a solution by simulation of an electrical network. This is the approach suggested by Cederbaum's maximum flow theorem. [18] [19] Acceleration of the algorithm is possible through parallel computing.

Integration with deep learning

As of the mid-2020s, graph cuts have evolved from standalone solvers into components within deep learning frameworks. While convolutional neural networks (CNNs) and Transformers excel at semantic recognition, they often produce boundaries that lack geometric precision. Graph cut algorithms are used to address this by enforcing global consistency and edge-alignment.

Differentiable graph cuts

Traditional max-flow/min-cut algorithms are discrete and non-differentiable, preventing their direct use in backpropagation. To overcome this, researchers have developed "soft" or differentiable relaxations of the graph cut objective. Methods such as Probabilistic Graph Cuts or SoftCut allow the gradient of the energy function to be computed with respect to the edge weights. This enables a neural network to learn the parameters of the energy function (the cost of cutting specific edges) end-to-end, effectively treating the graph cut solver as a specific layer within the network architecture.

As a loss function

In weakly supervised learning and medical image segmentation, the graph cut energy formulation is often utilized as a regularization loss function (often termed "Graph Cut Loss" or "Boundary Loss"). Instead of running a solver during inference, the network is trained to minimize a loss term that approximates the min-cut energy. This penalizes the network for predicting noisy or fuzzy boundaries, forcing the output segmentation to align with high-contrast edges in the source image without requiring an iterative solver at inference time.

Role in foundation model training

With the rise of multimodal large language models (MLLMs) and vision foundation models (such as the Segment Anything Model), graph cuts have found a renewed utility in the data curation pipeline.

Training these large-scale models requires massive datasets of high-quality segmentation masks, which are prohibitively expensive to generate manually pixel-by-pixel. Graph cut algorithms are employed to scale this process via weak supervision:

Pseudo-label generation: Annotators provide cheap inputs (bounding boxes or text prompts), and graph cut algorithms propagate these cues to generate dense, pixel-perfect masks (ground truth) used to train the transformer models.

Software

References

  1. Adelson, Edward H., and James R. Bergen (1991), "The plenoptic function and the elements of early vision", Computational models of visual processing 1.2 (1991).
  2. 1 2 3 4 Boykov, Y., Veksler, O., and Zabih, R. (2001), "Fast approximate energy minimization via graph cuts," IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11): 1222-1239.
  3. D.M. Greig, B.T. Porteous and A.H. Seheult (1989), Exact maximum a posteriori estimation for binary images , Journal of the Royal Statistical Society, Series B, 51, 271–279.
  4. 1 2 D. Geman and S. Geman (1984), Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images , IEEE Trans. Pattern Anal. Mach. Intell., 6, 721–741.
  5. J.E. Besag (1986), On the statistical analysis of dirty pictures (with discussion), Journal of the Royal Statistical Society Series B, 48, 259–302
  6. 1 2 Yuri Boykov, Vladimir Kolmogorov: An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9): 1124–1137 (2004)
  7. "Computer Vision Awards – The Computer Vision Foundation". www.thecvf.com. Archived from the original on 2026-01-12. Retrieved 2026-01-27.
  8. Camille Couprie, Leo Grady, Laurent Najman and Hugues Talbot, "Power Watersheds: A Unifying Graph-Based Optimization Framework”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 7, pp. 1384-1399, July 2011
  9. Leo Grady and Christopher Alvino (2009), "The Piecewise Smooth Mumford-Shah Functional on an Arbitrary Graph", IEEE Trans. on Image Processing, pp. 2547–2561
  10. Yuri Boykov and Vladimir Kolmogorov (2003), "Computing Geodesics and Minimal Surfaces via Graph Cuts", Proc. of ICCV
  11. Ben Appleton and Hugues Talbot (2006), "Globally Minimal Surfaces by Continuous Maximal Flows", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 106–118
  12. Ali Kemal Sinop and Leo Grady, "A Seeded Image Segmentation Framework Unifying Graph Cuts and Random Walker Which Yields A New Algorithm", Proc. of ICCV, 2007
  13. Vladimir Kolmogorov and Yuri Boykov (2005), "What Metrics Can Be Approximated by Geo-Cuts, or Global Optimization of Length/Area and Flux", Proc. of ICCV pp. 564–571
  14. Nicolas Lermé, François Malgouyres and Lucas Létocart (2010), "Reducing graphs in graph cut segmentation Archived 2012-03-27 at the Wayback Machine ", Proc. of ICIP, pp. 3045–3048
  15. Herve Lombaert, Yiyong Sun, Leo Grady, Chenyang Xu (2005), "A Multilevel Banded Graph Cuts Method for Fast Image Segmentation", Proc. of ICCV, pp. 259–265
  16. Yin Li, Jian Sun, Chi-Keung Tang, and Heung-Yeung Shum (2004), "Lazy Snapping", ACM Transactions on Graphics, pp. 303–308
  17. P.J. Yim: "Method and System for Image Segmentation," United States Patent US8929636, January 6, 2016
  18. Cederbaum, I. (1962-08-01). "On optimal operation of communication nets". Journal of the Franklin Institute. 274 (2): 130–141. doi:10.1016/0016-0032(62)90401-5. ISSN   0016-0032.
  19. I.T. Frisch, "On Electrical analogs for flow networks," Proceedings of IEEE, 57:2, pp. 209-210, 1969