Scale-space axioms

Last updated
Scale space
Scale-space axioms
Scale space implementation
Feature detection
Edge detection
Blob detection
Corner detection
Ridge detection
Interest point detection
Scale selection
Affine shape adaptation
Scale-space segmentation

In image processing and computer vision, a scale space framework can be used to represent an image as a family of gradually smoothed images. This framework is very general and a variety of scale space representations exist. A typical approach for choosing a particular type of scale space representation is to establish a set of scale-space axioms, describing basic properties of the desired scale-space representation and often chosen so as to make the representation useful in practical applications. Once established, the axioms narrow the possible scale-space representations to a smaller class, typically with only a few free parameters.

Contents

A set of standard scale space axioms, discussed below, leads to the linear Gaussian scale-space, which is the most common type of scale space used in image processing and computer vision.

Scale space axioms for the linear scale-space representation

The linear scale space representation of signal obtained by smoothing with the Gaussian kernel satisfies a number of properties 'scale-space axioms' that make it a special form of multi-scale representation:

linearity
where and are signals while and are constants,
shift invariance
where denotes the shift (translation) operator
semi-group structure
with the associated cascade smoothing property
existence of an infinitesimal generator
non-creation of local extrema (zero-crossings) in one dimension,
non-enhancement of local extrema in any number of dimensions
at spatial maxima and at spatial minima,
rotational symmetry
for some function ,
scale invariance
for some functions and where denotes the Fourier transform of ,
positivity
,
normalization
.

In fact, it can be shown that the Gaussian kernel is a unique choice given several different combinations of subsets of these scale-space axioms: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] most of the axioms (linearity, shift-invariance, semigroup) correspond to scaling being a semigroup of shift-invariant linear operator, which is satisfied by a number of families integral transforms, while "non-creation of local extrema" [4] for one-dimensional signals or "non-enhancement of local extrema" [4] [7] [10] for higher-dimensional signals are the crucial axioms which relate scale-spaces to smoothing (formally, parabolic partial differential equations), and hence select for the Gaussian.

The Gaussian kernel is also separable in Cartesian coordinates, i.e. . Separability is, however, not counted as a scale-space axiom, since it is a coordinate dependent property related to issues of implementation. In addition, the requirement of separability in combination with rotational symmetry per se fixates the smoothing kernel to be a Gaussian.

There exists a generalization of the Gaussian scale-space theory to more general affine and spatio-temporal scale-spaces. [10] [11] In addition to variabilities over scale, which original scale-space theory was designed to handle, this generalized scale-space theory also comprises other types of variabilities, including image deformations caused by viewing variations, approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local Galilean transformations. In this theory, rotational symmetry is not imposed as a necessary scale-space axiom and is instead replaced by requirements of affine and/or Galilean covariance. The generalized scale-space theory leads to predictions about receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision. [12] [13] [14]

In the computer vision, image processing and signal processing literature there are many other multi-scale approaches, using wavelets and a variety of other kernels, that do not exploit or require the same requirements as scale space descriptions do; please see the article on related multi-scale approaches. There has also been work on discrete scale-space concepts that carry the scale-space properties over to the discrete domain; see the article on scale space implementation for examples and references.

See also

Related Research Articles

<span class="mw-page-title-main">Dirac delta function</span> Generalized function whose value is zero everywhere except at zero

In mathematical analysis, the Dirac delta function, also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Thus it can be represented heuristically as

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form and with parametric extension for arbitrary real constants a, b and non-zero c. It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .

<span class="mw-page-title-main">Gaussian blur</span> Type of image blur produced by a Gaussian function

In image processing, a Gaussian blur is the result of blurring an image by a Gaussian function.

In imaging science, difference of Gaussians (DoG) is a feature enhancement algorithm that involves the subtraction of one Gaussian blurred version of an original image from another, less blurred version of the original. In the simple case of grayscale images, the blurred images are obtained by convolving the original grayscale images with Gaussian kernels having differing width. Blurring an image using a Gaussian kernel suppresses only high-frequency spatial information. Subtracting one image from the other preserves spatial information that lies between the range of frequencies that are preserved in the two blurred images. Thus, the DoG is a spatial band-pass filter that attenuates frequencies in the original grayscale image that are far from the band center.

<span class="mw-page-title-main">Corner detection</span> Approach used in computer vision systems

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

In the areas of computer vision, image analysis and signal processing, the notion of scale-space representation is used for processing measurement data at multiple scales, and specifically enhance or suppress image features over different ranges of scale. A special type of scale-space representation is provided by the Gaussian scale space, where the image data in N dimensions is subjected to smoothing by Gaussian convolution. Most of the theory for Gaussian scale space deals with continuous images, whereas one when implementing this theory will have to face the fact that most measurement data are discrete. Hence, the theoretical problem arises concerning how to discretize the continuous theory while either preserving or well approximating the desirable theoretical properties that lead to the choice of the Gaussian kernel. This article describes basic approaches for this that have been developed in the literature, see also for an in-depth treatment regarding the topic of approximating the Gaussian smoothing operation and the Gaussian derivative computations in scale-space theory.

The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi-scale approaches" in the areas of computer vision, image processing and signal processing, in particular the notion of wavelets. The purpose of this article is to describe a few of these approaches:

In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges.

<span class="mw-page-title-main">Simple cell</span> Brain cell

A simple cell in the primary visual cortex is a cell that responds primarily to oriented edges and gratings. These cells were discovered by Torsten Wiesel and David Hubel in the late 1950s.

In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other. The most common method for blob detection is by using convolution.

Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shape adaptation can be accomplished by iteratively warping a local image patch with affine transformations while applying a rotationally symmetric filter to the warped image patches. Provided that this iterative process converges, the resulting fixed point will be affine invariant. In the area of computer vision, this idea has been used for defining affine invariant interest point operators as well as affine invariant texture analysis methods.

In mathematics, the structure tensor, also referred to as the second-moment matrix, is a matrix derived from the gradient of a function. It describes the distribution of the gradient in a specified neighborhood around a point and makes the information invariant to the observing coordinates. The structure tensor is often used in image processing and computer vision.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

The Hessian affine region detector is a feature detector used in the fields of computer vision and image analysis. Like other feature detectors, the Hessian affine detector is typically used as a preprocessing step to algorithms that rely on identifiable, characteristic interest points.

In image processing and computer vision, anisotropic diffusion, also called Perona–Malik diffusion, is a technique aiming at reducing image noise without removing significant parts of the image content, typically edges, lines or other details that are important for the interpretation of the image. Anisotropic diffusion resembles the process that creates a scale space, where an image generates a parameterized family of successively more and more blurred images based on a diffusion process. Each of the resulting images in this family are given as a convolution between the image and a 2D isotropic Gaussian filter, where the width of the filter increases with the parameter. This diffusion process is a linear and space-invariant transformation of the original image. Anisotropic diffusion is a generalization of this diffusion process: it produces a family of parameterized images, but each resulting image is a combination between the original image and a filter that depends on the local content of the original image. As a consequence, anisotropic diffusion is a non-linear and space-variant transformation of the original image.

References

  1. Koenderink, Jan J. (August 1984). "The structure of images". Biological Cybernetics. 50 (5): 363–370. doi:10.1007/bf00336961. PMID   6477978. S2CID   206775432.
  2. Babaud, Jean; Witkin, Andrew P.; Baudin, Michel; Duda, Richard O. (1986). "Uniqueness of the Gaussian Kernel for Scale-Space Filtering". IEEE Transactions on Pattern Analysis and Machine Intelligence. 8 (1): 26–33. doi:10.1109/TPAMI.1986.4767749. PMID   21869320. S2CID   18295906.
  3. Yuille, Alan L.; Poggio, Tomaso A. (1986). "Scaling Theorems for Zero Crossings". IEEE Transactions on Pattern Analysis and Machine Intelligence. 8 (1): 15–25. doi:10.1109/TPAMI.1986.4767748. hdl: 1721.1/5655 . PMID   21869319. S2CID   14815630.
  4. 1 2 3 Lindeberg, T. (1990). "Scale-space for discrete signals". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (3): 234–254. doi:10.1109/34.49051.
  5. Lindeberg, Tony, Scale-Space Theory in Computer Vision, Kluwer, 1994,
  6. Pauwels, E.J.; Van Gool, L.J.; Fiddelaers, P.; Moons, T. (1995). "An extended class of scale-invariant and recursive scale space filters". IEEE Transactions on Pattern Analysis and Machine Intelligence. 17 (7): 691–701. doi:10.1109/34.391411.
  7. 1 2 Lindeberg, Tony (May 1996). "On the axiomatic foundations of linear scale-space: Combining semi-group structure with causality vs. scale invariance". In Sporring, J.; et al. (eds.). Gaussian Scale-Space Theory: Proc. PhD School on Scale-Space Theory. Copenhagen, Denmark: Kluwer Academic Publishers. pp. 75–98. urn: nbn:se:kth:diva-40221 .
  8. Florack, Luc, Image Structure, Kluwer Academic Publishers, 1997.
  9. Weickert, Joachim; Ishikawa, Seiji; Imiya, Atsushi (1999). "Linear Scale-Space has First been Proposed in Japan". Journal of Mathematical Imaging and Vision. 10 (3): 237–252. Bibcode:1999JMIV...10..237W. doi:10.1023/A:1008344623873. S2CID   17835046.
  10. 1 2 3 Lindeberg, Tony (2011). "Generalized Gaussian Scale-Space Axiomatics Comprising Linear Scale-Space, Affine Scale-Space and Spatio-Temporal Scale-Space". Journal of Mathematical Imaging and Vision. 40: 36–81. Bibcode:2011JMIV...40...36L. doi:10.1007/s10851-010-0242-2. S2CID   950099.
  11. 1 2 Lindeberg, Tony (2013). Generalized Axiomatic Scale-Space Theory. Advances in Imaging and Electron Physics. Vol. 178. pp. 1–96. doi:10.1016/B978-0-12-407701-0.00001-7. ISBN   9780124077010.
  12. Lindeberg, Tony (2013). "A computational theory of visual receptive fields". Biological Cybernetics. 107 (6): 589–635. doi:10.1007/s00422-013-0569-z. PMC   3840297 . PMID   24197240.
  13. Lindeberg, Tony (2013). "Invariance of visual operations at the level of receptive fields". PLOS ONE. 8 (7): e66990. arXiv: 1210.0754 . Bibcode:2013PLoSO...866990L. doi: 10.1371/journal.pone.0066990 . PMC   3716821 . PMID   23894283.
  14. Lindeberg, Tony (2021). "Normative theory of visual receptive fields". Heliyon. 7 (1): e05897. Bibcode:2021Heliy...705897L. doi: 10.1016/j.heliyon.2021.e05897 . PMC   7820928 . PMID   33521348.