Segmentation-based object categorization

Last updated

The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying graph partitioning via minimum cut or maximum cut. Segmentation-based object categorization can be viewed as a specific case of spectral clustering applied to image segmentation.

Contents

Applications of image segmentation

Segmentation using normalized cuts

Graph theoretic formulation

The set of points in an arbitrary feature space can be represented as a weighted undirected complete graph G = (V, E), where the nodes of the graph are the points in the feature space. The weight of an edge is a function of the similarity between the nodes and . In this context, we can formulate the image segmentation problem as a graph partitioning problem that asks for a partition of the vertex set , where, according to some measure, the vertices in any set have high similarity, and the vertices in two different sets have low similarity.

Normalized cuts

Let G = (V, E, w) be a weighted graph. Let and be two subsets of vertices.

Let:

In the normalized cuts approach, [2] for any cut in , measures the similarity between different parts, and measures the total similarity of vertices in the same part.

Since , a cut that minimizes also maximizes .

Computing a cut that minimizes is an NP-hard problem. However, we can find in polynomial time a cut of small normalized weight using spectral techniques.

The ncut algorithm

Let:

Also, let D be an diagonal matrix with on the diagonal, and let be an symmetric matrix with .

After some algebraic manipulations, we get:

subject to the constraints:

Minimizing subject to the constraints above is NP-hard. To make the problem tractable, we relax the constraints on , and allow it to take real values. The relaxed problem can be solved by solving the generalized eigenvalue problem for the second smallest generalized eigenvalue.

The partitioning algorithm:

  1. Given a set of features, set up a weighted graph , compute the weight of each edge, and summarize the information in and .
  2. Solve for eigenvectors with the second smallest eigenvalues.
  3. Use the eigenvector with the second smallest eigenvalue to bipartition the graph (e.g. grouping according to sign).
  4. Decide if the current partition should be subdivided.
  5. Recursively partition the segmented parts, if necessary.

Computational Complexity

Solving a standard eigenvalue problem for all eigenvectors (using the QR algorithm, for instance) takes time. This is impractical for image segmentation applications where is the number of pixels in the image.

Since only one eigenvector, corresponding to the second smallest generalized eigenvalue, is used by the uncut algorithm, efficiency can be dramatically improved if the solve of the corresponding eigenvalue problem is performed in a matrix-free fashion, i.e., without explicitly manipulating with or even computing the matrix W, as, e.g., in the Lanczos algorithm. Matrix-free methods require only a function that performs a matrix-vector product for a given vector, on every iteration. For image segmentation, the matrix W is typically sparse, with a number of nonzero entries , so such a matrix-vector product takes time.

For high-resolution images, the second eigenvalue is often ill-conditioned, leading to slow convergence of iterative eigenvalue solvers, such as the Lanczos algorithm. Preconditioning is a key technology accelerating the convergence, e.g., in the matrix-free LOBPCG method. Computing the eigenvector using an optimally preconditioned matrix-free method takes time, which is the optimal complexity, since the eigenvector has components.

Software Implementations

scikit-learn [3] uses LOBPCG from SciPy with algebraic multigrid preconditioning for solving the eigenvalue problem for the graph Laplacian to perform image segmentation via spectral graph partitioning as first proposed in [4] and actually tested in [5] and. [6]

OBJ CUT

OBJ CUT [7] is an efficient method that automatically segments an object. The OBJ CUT method is a generic method, and therefore it is applicable to any object category model. Given an image D containing an instance of a known object category, e.g. cows, the OBJ CUT algorithm computes a segmentation of the object, that is, it infers a set of labels m.

Let m be a set of binary labels, and let be a shape parameter( is a shape prior on the labels from a layered pictorial structure (LPS) model). An energy function is defined as follows.

(1)

The term is called a unary term, and the term is called a pairwise term. A unary term consists of the likelihood based on color, and the unary potential based on the distance from . A pairwise term consists of a prior and a contrast term .

The best labeling minimizes , where is the weight of the parameter .

(2)

Algorithm

  1. Given an image D, an object category is chosen, e.g. cows or horses.
  2. The corresponding LPS model is matched to D to obtain the samples
  3. The objective function given by equation (2) is determined by computing and using
  4. The objective function is minimized using a single MINCUT operation to obtain the segmentation m.

Other approaches

Related Research Articles

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

<span class="mw-page-title-main">Ellipsoid</span> Quadric surface that looks like a deformed sphere

An ellipsoid is a surface that can be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation.

Ray transfer matrix analysis is a mathematical form for performing ray tracing calculations in sufficiently simple problems which can be solved considering only paraxial rays. Each optical element is described by a 2×2 ray transfer matrix which operates on a vector describing an incoming light ray to calculate the outgoing ray. Multiplication of the successive matrices thus yields a concise ray transfer matrix describing the entire optical system. The same mathematics is also used in accelerator physics to track particles through the magnet installations of a particle accelerator, see electron optics.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.

In mathematics, a self-adjoint operator on an infinite-dimensional complex vector space V with inner product is a linear map A that is its own adjoint. If V is finite-dimensional with a given orthonormal basis, this is equivalent to the condition that the matrix of A is a Hermitian matrix, i.e., equal to its conjugate transpose A. By the finite-dimensional spectral theorem, V has an orthonormal basis such that the matrix of A relative to this basis is a diagonal matrix with entries in the real numbers. This article deals with applying generalizations of this concept to operators on Hilbert spaces of arbitrary dimension.

In linear algebra, a tridiagonal matrix is a band matrix that has nonzero elements only on the main diagonal, the subdiagonal/lower diagonal, and the supradiagonal/upper diagonal. For example, the following matrix is tridiagonal:

<span class="mw-page-title-main">Cramér–Rao bound</span> Lower bound on variance of an estimator

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

In theoretical physics, a supermultiplet is a representation of a supersymmetry algebra, possibly with extended supersymmetry.

<span class="mw-page-title-main">Bloch sphere</span> Geometrical representation of the pure state space of a two-level quantum mechanical system

In quantum mechanics and computing, the Bloch sphere is a geometrical representation of the pure state space of a two-level quantum mechanical system (qubit), named after the physicist Felix Bloch.

This is a list of some vector calculus formulae for working with common curvilinear coordinate systems.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

<span class="mw-page-title-main">Euler's rotation theorem</span> Movement with a fixed point is rotation

In geometry, Euler's rotation theorem states that, in three-dimensional space, any displacement of a rigid body such that a point on the rigid body remains fixed, is equivalent to a single rotation about some axis that runs through the fixed point. It also means that the composition of two rotations is also a rotation. Therefore the set of rotations has a group structure, known as a rotation group.

The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the "most useful" eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than . Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.

In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.

<span class="mw-page-title-main">Spectral clustering</span> Clustering methods

In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

EigenMoments is a set of orthogonal, noise robust, invariant to rotation, scaling and translation and distribution sensitive moments. Their application can be found in signal processing and computer vision as descriptors of the signal or image. The descriptors can later be used for classification purposes.

<span class="mw-page-title-main">Diffusion map</span>

Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.

In machine learning, automatic basis function construction is the mathematical method of looking for a set of task-independent basis functions that map the state space to a lower-dimensional embedding, while still representing the value function accurately. Automatic basis construction is independent of prior knowledge of the domain, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

References

  1. Lopez, Clélia; Leclercq, Ludovic; Krishnakumari, Panchamy; Chiabaut, Nicolas; Van Lint, Hans (25 October 2017). "Revealing the day-to-day regularity of urban congestion patterns with 3D speed maps". Scientific Reports. 7 (14029): 14029. Bibcode:2017NatSR...714029L. doi:10.1038/s41598-017-14237-8. PMC   5656590 . PMID   29070859.
  2. Jianbo Shi and Jitendra Malik (1997): "Normalized Cuts and Image Segmentation", IEEE Conference on Computer Vision and Pattern Recognition, pp 731–737
  3. "Spectral Clustering — scikit-learn documentation".
  4. Knyazev, Andrew V. (2003). Boley; Dhillon; Ghosh; Kogan (eds.). Modern preconditioned eigensolvers for spectral image segmentation and graph bisection. Clustering Large Data Sets; Third IEEE International Conference on Data Mining (ICDM 2003) Melbourne, Florida: IEEE Computer Society. pp. 59–62.
  5. Knyazev, Andrew V. (2006). Multiscale Spectral Image Segmentation Multiscale preconditioning for computing eigenvalues of graph Laplacians in image segmentation. Fast Manifold Learning Workshop, WM Williamburg, VA. doi:10.13140/RG.2.2.35280.02565.
  6. Knyazev, Andrew V. (2006). Multiscale Spectral Graph Partitioning and Image Segmentation. Workshop on Algorithms for Modern Massive Datasets Stanford University and Yahoo! Research.
  7. M. P. Kumar, P. H. S. Torr, and A. Zisserman. Obj cut. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pages 18–25, 2005.
  8. E. Borenstein, S. Ullman: Class-specific, top-down segmentation. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 109–124, 2002.
  9. Z. Tu, X. Chen, A. L. Yuille, S. C. Zhu: Image Parsing: Unifying Segmentation, Detection, and Recognition. Toward Category-Level Object Recognition 2006: 545–576
  10. B. Leibe, A. Leonardis, B. Schiele: An Implicit Shape Model for Combined Object Categorization and Segmentation. Toward Category-Level Object Recognition 2006: 508–524
  11. J. Winn, N. Joijic. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005.
  12. J. M. Winn, J. Shotton: The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR (1) 2006: 37–44