PatchMatch

Last updated
Flowers at right bottom corner are removed using PatshMatsh PatchMatch.jpg
Flowers at right bottom corner are removed using PatсhMatсh

The core PatchMatch algorithm quickly finds correspondences between small square regions (or patches) of an image. The algorithm can be used in various applications such as object removal from images, reshuffling or moving contents of images, or retargeting or changing aspect ratios of images, optical flow estimation, or stereo correspondence.

Contents

Algorithm

The goal of the algorithm is to find the patch correspondence by defining a nearest-neighbor field (NNF) as a function of offsets, which is over all possible matches of patch (location of patch centers) in image A, for some distance function of two patches . So, for a given patch coordinate in image and its corresponding nearest neighbor in image , is simply . However, if we search for every point in image , the work will be too hard to complete. So the following algorithm is done in a randomized approach in order to accelerate the calculation speed. The algorithm has three main components. Initially, the nearest-neighbor field is filled with either random offsets or some prior information. Next, an iterative update process is applied to the NNF, in which good patch offsets are propagated to adjacent pixels, followed by random search in the neighborhood of the best offset found so far. Independent of these three components, the algorithm also uses a coarse-to-fine approach by building an image pyramid to obtain the better result.

Initialization

When initializing with random offsets, we use independent uniform samples across the full range of image . This algorithm avoids using an initial guess from the previous level of the pyramid because in this way the algorithm can avoid being trapped in local minima [ clarification needed ].

Iteration

After initialization, the algorithm attempted to perform iterative process of improving the . The iterations examine the offsets in scan order (from left to right, top to bottom), and each undergoes propagation followed by random search.

Propagation

We attempt to improve using the known offsets of and , assuming that the patch offsets are likely to be the same. That is, the algorithm will take new value for to be . So if has a correct mapping and is in a coherent region , then all of below and to the right of will be filled with the correct mapping. Alternatively, on even iterations, the algorithm search for different direction, fill the new value to be .

Let , we attempt to improve by testing a sequence of candidate offsets at an exponentially decreasing distance from

where is a uniform random in , is a large window search radius which will be set to maximum picture size, and is a fixed ratio often assigned as 1/2. This part of the algorithm allows the to jump out of local minimum through random process.

Halting criterion

The often used halting criterion is set the iteration times to be about 4~5. Even with low iteration, the algorithm works well.

Conclusion

This is an efficient algorithm since it only takes a few second on a testing computer with Intel Core i5 CPU and Photoshop CS5.

See also

Related Research Articles

<span class="mw-page-title-main">Boundary (topology)</span> All points not part of the interior of a subset of a topological space

In topology and mathematics in general, the boundary of a subset S of a topological space X is the set of points in the closure of S not belonging to the interior of S. An element of the boundary of S is called a boundary point of S. The term boundary operation refers to finding or taking the boundary of a set. Notations used for boundary of a set S include and .

<span class="mw-page-title-main">Arg max</span> Inputs at which function values are highest

In mathematics, the arguments of the maxima and arguments of the minima are the input points at which a function output value is maximized and minimized, respectively. While the arguments are defined over the domain of a function, the output is part of its codomain.

<span class="mw-page-title-main">Multidimensional scaling</span> Set of related ordination techniques used in information visualization

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of objects in a set into a configuration of points mapped into an abstract Cartesian space.

The density matrix renormalization group (DMRG) is a numerical variational technique devised to obtain the low-energy physics of quantum many-body systems with high accuracy. As a variational method, DMRG is an efficient algorithm that attempts to find the lowest-energy matrix product state wavefunction of a Hamiltonian. It was invented in 1992 by Steven R. White and it is nowadays the most efficient method for 1-dimensional systems.

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression:

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances, but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

The Kaczmarz method or Kaczmarz's algorithm is an iterative algorithm for solving linear equation systems . It was first discovered by the Polish mathematician Stefan Kaczmarz, and was rediscovered in the field of image reconstruction from projections by Richard Gordon, Robert Bender, and Gabor Herman in 1970, where it is called the Algebraic Reconstruction Technique (ART). ART includes the positivity constraint, making it nonlinear.

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

Shape context is a feature descriptor used in object recognition. Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

Manifold alignment is a class of machine learning algorithms that produce projections between sets of data, given that the original data sets lie on a common manifold. The concept was first introduced as such by Ham, Lee, and Saul in 2003, adding a manifold constraint to the general problem of correlating sets of high-dimensional vectors.

Sparse dictionary learning is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

In mathematical optimization, the proximal operator is an operator associated with a proper, lower semi-continuous convex function from a Hilbert space to , and is defined by:

Computational anatomy (CA) is a discipline within medical imaging focusing on the study of anatomical shape and form at the visible or gross anatomical scale of morphology. The field is broadly defined and includes foundations in anatomy, applied mathematics and pure mathematics, including medical imaging, neuroscience, physics, probability, and statistics. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. The central focus of the sub-field of computational anatomy within medical imaging is mapping information across anatomical coordinate systems most often dense information measured within a magnetic resonance image (MRI). The introduction of flows into CA, which are akin to the equations of motion used in fluid dynamics, exploit the notion that dense coordinates in image analysis follow the Lagrangian and Eulerian equations of motion. In models based on Lagrangian and Eulerian flows of diffeomorphisms, the constraint is associated to topological properties, such as open sets being preserved, coordinates not crossing implying uniqueness and existence of the inverse mapping, and connected sets remaining connected. The use of diffeomorphic methods grew quickly to dominate the field of mapping methods post Christensen's original paper, with fast and symmetric methods becoming available.

<span class="mw-page-title-main">Spiral optimization algorithm</span> Optimization algorithm

In mathematics, the spiral optimization (SPO) algorithm is a metaheuristic inspired by spiral phenomena in nature.

The convolutional sparse coding paradigm is an extension of the global sparse coding model, in which a redundant dictionary is modeled as a concatenation of circulant matrices. While the global sparsity constraint describes signal as a linear combination of a few atoms in the redundant dictionary , usually expressed as for a sparse vector , the alternative dictionary structure adopted by the convolutional sparse coding model allows the sparsity prior to be applied locally instead of globally: independent patches of are generated by "local" dictionaries operating over stripes of .

In the mathematical theory of structural rigidity, the Cayley configuration space of a linkage over a set of its non-edges , called Cayley parameters, is the set of distances attained by over all its frameworks, under some -norm. In other words, each framework of the linkage prescribes a unique set of distances to the non-edges of , so the set of all frameworks can be described by the set of distances attained by any subset of these non-edges. Note that this description may not be a bijection. The motivation for using distance parameters is to define a continuous quadratic branched covering from the configuration space of a linkage to a simpler, often convex, space. Hence, obtaining a framework from a Cayley configuration space of a linkage over some set of non-edges is often a matter of solving quadratic equations.

References