Contextual image classification

Last updated

Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood. The goal of this approach is to classify the images by using the contextual information.

Contents

Introduction

Similar as processing language, a single word may have multiple meanings unless the context is provided, and the patterns within the sentences are the only informative segments we care about. For images, the principle is same. Find out the patterns and associate proper meanings to them.

As the image illustrated below, if only a small portion of the image is shown, it is very difficult to tell what the image is about.

Mouth 01 mouth.png
Mouth

Even try another portion of the image, it is still difficult to classify the image.

Left eye 02 left eye.png
Left eye

However, if we increase the contextual of the image, then it makes more sense to recognize.

Increased field of smiling face 03 increased image of smiling face.png
Increased field of smiling face

As the full images shows below, almost everyone can classify it easily.

Full image Full image.png
Full image

During the procedure of segmentation, the methods which do not use the contextual information are sensitive to noise and variations, thus the result of segmentation will contain a great deal of misclassified regions, and often these regions are small (e.g., one pixel).

Compared to other techniques, this approach is robust to noise and substantial variations for it takes the continuity of the segments into account.

Several methods of this approach will be described below.

Applications

Functioning as a post-processing filter to a labelled image

This approach is very effective against small regions caused by noise. And these small regions are usually formed by few pixels or one pixel. The most probable label is assigned to these regions. However, there is a drawback of this method. The small regions also can be formed by correct regions rather than noise, and in this case the method is actually making the classification worse. This approach is widely used in remote sensing applications.

Improving the post-processing classification

This is a two-stage classification process:

  1. For each pixel, label the pixel and form a new feature vector for it.
  2. Use the new feature vector and combine the contextual information to assign the final label to the

Merging the pixels in earlier stages

Instead of using single pixels, the neighbour pixels can be merged into homogeneous regions benefiting from contextual information. And provide these regions to classifier.

Acquiring pixel feature from neighbourhood

The original spectral data can be enriched by adding the contextual information carried by the neighbour pixels, or even replaced in some occasions. This kind of pre-processing methods are widely used in textured image recognition. The typical approaches include mean values, variances, texture description, etc.

Combining spectral and spatial information

The classifier uses the grey level and pixel neighbourhood (contextual information) to assign labels to pixels. In such case the information is a combination of spectral and spatial information.

Powered by the Bayes minimum error classifier

Contextual classification of image data is based on the Bayes minimum error classifier (also known as a naive Bayes classifier).

Present the pixel:

here denotes the assigned class.

The neighbourhood: Size of the neighbourhood. There is no limitation of the size, but it is considered to be relatively small for each pixel . A reasonable size of neighbourhood would be of 4-connectivity or 8-connectivity ( is marked as red and placed in the centre).

The calculation:

Apply the minimum error classification on a pixel , if the probability of a class being presenting the pixel is the highest among all, then assign as its class.

The contextual classification rule is described as below, it uses the feature vector rather than .

Use the Bayes formula to calculate the posteriori probability

The number of vectors is the same as the number of pixels in the image. For the classifier uses a vector corresponding to each pixel , and the vector is generated from the pixel's neighbourhood.

The basic steps of contextual image classification:

  1. Calculate the feature vector for each pixel.
  2. Calculate the parameters of probability distribution and
  3. Calculate the posterior probabilities and all labels . Get the image classification result.

Algorithms

Template matching

The template matching is a "brute force" implementation of this approach. [1] The concept is first create a set of templates, and then look for small parts in the image match with a template.

This method is computationally high and inefficient. It keeps an entire templates list during the whole process and the number of combinations is extremely high. For a pixel image, there could be a maximum of combinations, which leads to high computation. This method is a top down method and often called table look-up or dictionary look-up.

Lower-order Markov chain

The Markov chain [2] also can be applied in pattern recognition. The pixels in an image can be recognised as a set of random variables, then use the lower order Markov chain to find the relationship among the pixels. The image is treated as a virtual line, and the method uses conditional probability.

Hilbert space-filling curves

The Hilbert curve runs in a unique pattern through the whole image, it traverses every pixel without visiting any of them twice and keeps a continuous curve. It is fast and efficient.

Markov meshes

The lower-order Markov chain and Hilbert space-filling curves mentioned above are treating the image as a line structure. The Markov meshes however will take the two dimensional information into account.

Dependency tree

The dependency tree [3] is a method using tree dependency to approximate probability distributions.

Related Research Articles

Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power. These activities can be viewed as two facets of the same field of application, and they have undergone substantial development over the past few decades.

In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols , (where is the nabla operator), or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δf (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p).

<span class="mw-page-title-main">Lorentz group</span> Lie group of Lorentz transformations

In physics and mathematics, the Lorentz group is the group of all Lorentz transformations of Minkowski spacetime, the classical and quantum setting for all (non-gravitational) physical phenomena. The Lorentz group is named for the Dutch physicist Hendrik Lorentz.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In mathematics, and specifically differential geometry, a connection form is a manner of organizing the data of a connection using the language of moving frames and differential forms.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics. The Hamilton–Jacobi equation is particularly useful in identifying conserved quantities for mechanical systems, which may be possible even when the mechanical problem itself cannot be solved completely.

In mathematics, a π-system on a set is a collection of certain subsets of such that

The Franz–Keldysh effect is a change in optical absorption by a semiconductor when an electric field is applied. The effect is named after the German physicist Walter Franz and Russian physicist Leonid Keldysh.

In many-body theory, the term Green's function is sometimes used interchangeably with correlation function, but refers specifically to correlators of field operators or creation and annihilation operators.

Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms. Consider a classification function

One-shot learning is an object categorization problem, found mostly in computer vision. Whereas most machine learning-based object categorization algorithms require training on hundreds or thousands of examples, one-shot learning aims to classify objects from one, or only a few, examples. The term few-shot learning is also used for these problems, especially when more than one example is needed.

<span class="mw-page-title-main">Linear flow on the torus</span>

In mathematics, especially in the area of mathematical analysis known as dynamical systems theory, a linear flow on the torus is a flow on the n-dimensional torus

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

In optics, the Fraunhofer diffraction equation is used to model the diffraction of waves when the diffraction pattern is viewed at a long distance from the diffracting object, and also when it is viewed at the focal plane of an imaging lens.

Calculations in the Newman–Penrose (NP) formalism of general relativity normally begin with the construction of a complex null tetrad, where is a pair of real null vectors and is a pair of complex null vectors. These tetrad vectors respect the following normalization and metric conditions assuming the spacetime signature

<span class="mw-page-title-main">Symmetry in quantum mechanics</span> Properties underlying modern physics

Symmetries in quantum mechanics describe features of spacetime and particles which are unchanged under some transformation, in the context of quantum mechanics, relativistic quantum mechanics and quantum field theory, and with applications in the mathematical formulation of the standard model and condensed matter physics. In general, symmetry in physics, invariance, and conservation laws, are fundamentally important constraints for formulating physical theories and models. In practice, they are powerful methods for solving problems and predicting what can happen. While conservation laws do not always give the answer to the problem directly, they form the correct constraints and the first steps to solving a multitude of problems.

Bayesian hierarchical modelling is a statistical model written in multiple levels that estimates the parameters of the posterior distribution using the Bayesian method. The sub-models combine to form the hierarchical model, and Bayes' theorem is used to integrate them with the observed data and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the prior distribution is acquired.

<span class="mw-page-title-main">Event detection for WSN</span>

This article is about event detection for WSN.

The product of exponentials (POE) method is a robotics convention for mapping the links of a spatial kinematic chain. It is an alternative to Denavit–Hartenberg parameterization. While the latter method uses the minimal number of parameters to represent joint motions, the former method has a number of advantages: uniform treatment of prismatic and revolute joints, definition of only two reference frames, and an easy geometric interpretation from the use of screw axes for each joint.

Paden–Kahan subproblems are a set of solved geometric problems which occur frequently in inverse kinematics of common robotic manipulators. Although the set of problems is not exhaustive, it may be used to simplify inverse kinematic analysis for many industrial robots.

References

  1. G.T. Toussaint, "The Use of Context in Pattern Recognition," Pattern Recognition, vol. 10, 1977, pp. 189–204.
  2. K. Abend, T.J. Harley, and L.N. Kanal, "Classification of Binary Random Patterns," IEEE Transactions on Information Theory, vol. 11, no. 4, October 1965, pp. 538–544.
  3. C.K. Chow and C.N. Liu, "Approximating Discrete Probability Distributions with Dependence Trees," IEEE Transactions on Information Theory, vol.14, no. 3, May 1965, pp. 462–467.