As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision [1] ), such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization.
Graph cut techniques are now increasingly being used in combination with more general spatial Artificial intelligence techniques (eg to enforce structure in Large language model output to sharpen tumour boundaries and similarly for various Self-driving car, Robotics, Google Maps applications etc).
Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph [2] (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution.
Although many computer vision algorithms involve cutting a graph (e.g. normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms).
"Binary" problems (such as denoising a binary image) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a grayscale image) cannot be solved exactly, but solutions produced are usually near the global optimum.
The foundational theory of graph cuts was first applied in computer vision in a legendary 1989 paper by Margaret Greig, Bruce Porteous and Allan Seheult [3] (GPS) of Durham University. In the Bayesian statistical context of smoothing noisy (or corrupted) binary images, using a Markov random field as the image prior distribution to ensure spatial consistency, they showed with a deceptively simple argument how the maximum a posteriori estimate of a binary image can be obtained exactly: by maximizing the flow through an associated image network, involving the introduction of a source and sink. The problem was therefore shown to be efficiently solvable exactly, an unexpected result due the size of the problem.
Prior to this result, approximate local optimisation techniques such as simulated annealing (as proposed by the Geman brothers), [4] or iterated conditional modes (a type of greedy algorithm suggested by Julian Besag) [5] were used to solve such image smoothing problems. The GPS paper combined ideas from Mathematical statistics (Bayes' theorem), Physics (Ising model), Optimisation (Energy function) and Computer science (Network flow problem) and led the move away from approximate local optimisation approaches (eg simulated annealing) to more powerful exact, or near exact, global optimisation techniques.
Although GPS is now recognised as seminal, it was well ahead of its time and, in particular, was published years before the computing power revolution of Moore's law. Significantly, GPS was published in a mathematical statistics, rather than a computer vision, journal and this lead to it being overlooked by the computer vision community for many years. It is unofficially known as "The Velvet Underground" paper of computer vision (ie although very few computer vision people read the paper [bought the record], those that did started new research [formed a band]).
Although the general -colour problem is NP hard for the GPS approach has turned out to have wide applicability in general computer vision problems. This was first demonstrated by Boykov, Veksler and Zabih [2] who, in a seminal paper published more than 10 years after the original GPS paper, lit the blue touch paper for the general adoption of graph cut techniques in computer vision. They showed that, for general problems, the GPS approach can be applied iteratively to sequences of binary problems to yield near optimal solutions.
In 2011, Couprie et al. [6] proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued indicator function from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent . When , the Power Watershed is optimized by graph cuts, when the Power Watershed is optimized by shortest paths, is optimized by the random walker algorithm and is optimized by the watershed algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms.
where the energy is composed of two different models ( and ):
— unary term describing the likelihood of each color.
— binary term describing the coherence between neighborhood pixels.
Graph cuts methods have become popular alternatives to the level set-based approaches for optimizing the location of a contour (see [7] for an extensive comparison). However, graph cut approaches have been criticized in the literature for several issues:
The Boykov-Kolmogorov algorithm [15] is an efficient way to compute the max-flow for computer vision-related graphs.
The Sim Cut algorithm [16] approximates the minimum graph cut. The algorithm implements a solution by simulation of an electrical network. This is the approach suggested by Cederbaum's maximum flow theorem. [17] [18] Acceleration of the algorithm is possible through parallel computing.