Scale-space segmentation

Last updated
Scale space
Scale-space axioms
Scale space implementation
Feature detection
Edge detection
Blob detection
Corner detection
Ridge detection
Interest point detection
Scale selection
Affine shape adaptation
Scale-space segmentation
A one-dimension example of scale-space segmentation. A signal (black), multi-scale-smoothed versions of it (red), and segment averages (blue) based on scale-space segmentation Scale Space Seg.png
A one-dimension example of scale-space segmentation. A signal (black), multi-scale-smoothed versions of it (red), and segment averages (blue) based on scale-space segmentation
The dendrogram corresponding to the segmentations in the figure above. Each "x" identifies the position of an extremum of the first derivative of one of 15 smoothed versions of the signal (red for maxima, blue for minima). Each "+" identifies the position that the extremum tracks back to at the finest scale. The signal features that persist to the highest scale (smoothest version) are evident as the tall structures that correspond to the major segment boundaries in the figure above. Dendrogram.png
The dendrogram corresponding to the segmentations in the figure above. Each "×" identifies the position of an extremum of the first derivative of one of 15 smoothed versions of the signal (red for maxima, blue for minima). Each "+" identifies the position that the extremum tracks back to at the finest scale. The signal features that persist to the highest scale (smoothest version) are evident as the tall structures that correspond to the major segment boundaries in the figure above.

Scale-space segmentation or multi-scale segmentation is a general framework for signal and image segmentation, based on the computation of image descriptors at multiple scales of smoothing.

Contents

One-dimensional hierarchical signal segmentation

Witkin's seminal work in scale space [1] included the notion that a one-dimensional signal could be unambiguously segmented into regions, with one scale parameter controlling the scale of segmentation.

A key observation is that the zero-crossings of the second derivatives (which are minima and maxima of the first derivative or slope) of multi-scale-smoothed versions of a signal form a nesting tree, which defines hierarchical relations between segments at different scales. Specifically, slope extrema at coarse scales can be traced back to corresponding features at fine scales. When a slope maximum and slope minimum annihilate each other at a larger scale, the three segments that they separated merge into one segment, thus defining the hierarchy of segments.

Image segmentation and primal sketch

There have been numerous research works in this area, out of which a few have now reached a state where they can be applied either with interactive manual intervention (usually with application to medical imaging) or fully automatically. The following is a brief overview of some of the main research ideas that current approaches are based upon.

The nesting structure that Witkin described is, however, specific for one-dimensional signals and does not trivially transfer to higher-dimensional images. Nevertheless, this general idea has inspired several other authors to investigate coarse-to-fine schemes for image segmentation. Koenderink [2] proposed to study how iso-intensity contours evolve over scales and this approach was investigated in more detail by Lifshitz and Pizer. [3] Unfortunately, however, the intensity of image features changes over scales, which implies that it is hard to trace coarse-scale image features to finer scales using iso-intensity information.

Lindeberg [4] studied the problem of linking local extrema and saddle points over scales, and proposed an image representation called the scale-space primal sketch which makes explicit the relations between structures at different scales, and also makes explicit which image features are stable over large ranges of scale including locally appropriate scales for those. Bergholm [5] proposed to detect edges at coarse scales in scale-space and then trace them back to finer scales with manual choice of both the coarse detection scale and the fine localization scale.

Gauch and Pizer [6] studied the complementary problem of ridges and valleys at multiple scales and developed a tool for interactive image segmentation based on multi-scale watersheds. The use of multi-scale watershed with application to the gradient map has also been investigated by Olsen and Nielsen [7] and has been carried over to clinical use by Dam et al. [8] Vincken et al. [9] proposed a hyperstack for defining probabilistic relations between image structures at different scales. The use of stable image structures over scales has been furthered by Ahuja and his co-workers [10] [11] into a fully automated system. A fully automatic brain segmentation algorithm based on closely related ideas of multi-scale watersheds has been presented by Undeman and Lindeberg [12] and been extensively tested in brain databases.

These ideas for multi-scale image segmentation by linking image structures over scales have also been picked up by Florack and Kuijper. [13] Bijaoui and Rué [14] associate structures detected in scale-space above a minimum noise threshold into an object tree which spans multiple scales and corresponds to a kind of feature in the original signal. Extracted features are accurately reconstructed using an iterative conjugate gradient matrix method.

Segmentation of vector functions of time

Scale-space segmentation was extended in another direction by Lyon [15] to vector-valued functions of time, where the vector derivative does not have maxima and minima, and the second derivative does not have zero crossings, by putting segment boundaries instead at maxima of the Euclidean magnitude of the vector derivative of the smoothed vector signals. This technique has been applied to segmentation of speech and of text. [16]

Related Research Articles

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .

<span class="mw-page-title-main">Corner detection</span> Approach used in computer vision systems

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

In the areas of computer vision, image analysis and signal processing, the notion of scale-space representation is used for processing measurement data at multiple scales, and specifically enhance or suppress image features over different ranges of scale. A special type of scale-space representation is provided by the Gaussian scale space, where the image data in N dimensions is subjected to smoothing by Gaussian convolution. Most of the theory for Gaussian scale space deals with continuous images, whereas one when implementing this theory will have to face the fact that most measurement data are discrete. Hence, the theoretical problem arises concerning how to discretize the continuous theory while either preserving or well approximating the desirable theoretical properties that lead to the choice of the Gaussian kernel. This article describes basic approaches for this that have been developed in the literature.

In image processing and computer vision, a scale space framework can be used to represent an image as a family of gradually smoothed images. This framework is very general and a variety of scale space representations exist. A typical approach for choosing a particular type of scale space representation is to establish a set of scale-space axioms, describing basic properties of the desired scale-space representation and often chosen so as to make the representation useful in practical applications. Once established, the axioms narrow the possible scale-space representations to a smaller class, typically with only a few free parameters.

The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi-scale approaches" in the areas of computer vision, image processing and signal processing, in particular the notion of wavelets. The purpose of this article is to describe a few of these approaches:

In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges.

In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other. The most common method for blob detection is convolution.

Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shape adaptation can be accomplished by iteratively warping a local image patch with affine transformations while applying a rotationally symmetric filter to the warped image patches. Provided that this iterative process converges, the resulting fixed point will be affine invariant. In the area of computer vision, this idea has been used for defining affine invariant interest point operators as well as affine invariant texture analysis methods.

In mathematics, the structure tensor, also referred to as the second-moment matrix, is a matrix derived from the gradient of a function. It describes the distribution of the gradient in a specified neighborhood around a point and makes the information invariant respect the observing coordinates. The structure tensor is often used in image processing and computer vision.

<span class="mw-page-title-main">Watershed (image processing)</span>

In the study of image processing, a watershed is a transformation defined on a grayscale image. The name refers metaphorically to a geological watershed, or drainage divide, which separates adjacent drainage basins. The watershed transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges.

The Hessian affine region detector is a feature detector used in the fields of computer vision and image analysis. Like other feature detectors, the Hessian affine detector is typically used as a preprocessing step to algorithms that rely on identifiable, characteristic interest points.

<span class="mw-page-title-main">Pyramid (image processing)</span> Type of multi-scale signal representation

Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampling. Pyramid representation is a predecessor to scale-space representation and multiresolution analysis.

ViBe is a background subtraction algorithm which has been presented at the IEEE ICASSP 2009 conference and was refined in later publications. More precisely, it is a software module for extracting background information from moving images. It has been developed by Oliver Barnich and Marc Van Droogenbroeck of the Montefiore Institute, University of Liège, Belgium.

<span class="mw-page-title-main">Andrew Witkin</span> American computer scientist (1952–2010)

Andrew Paul Witkin was an American computer scientist who made major contributions in computer vision and computer graphics.

Medical image computing (MIC) is an interdisciplinary field at the intersection of computer science, information engineering, electrical engineering, physics, mathematics and medicine. This field develops computational and mathematical methods for solving problems pertaining to medical images and their use for biomedical research and clinical care.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

<span class="mw-page-title-main">Object co-segmentation</span>

In computer vision, object co-segmentation is a special case of image segmentation, which is defined as jointly segmenting semantically similar objects in multiple images or video frames.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

References

  1. Witkin, A. (1984). "Scale-space filtering: A new approach to multi-scale description" (PDF). ICASSP '84. IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 9. pp. 150–153. doi:10.1109/ICASSP.1984.1172729. S2CID   11755124. Archived from the original (PDF) on 2019-08-01. Retrieved 2019-08-01.
  2. Koenderink, Jan "The structure of images", Biological Cybernetics, 50:363--370, 1984
  3. Lifshitz, L.M.; Pizer, S.M. (1990). "A multiresolution hierarchical approach to image segmentation based on intensity extrema". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (6): 529–540. doi:10.1109/34.56189.
  4. Lindeberg, Tony (1993). "Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention". International Journal of Computer Vision. 11 (3): 283–318. doi:10.1007/BF01469346. S2CID   11998035.
  5. Bergholm, F. (1987). "Edge focusing". IEEE Transactions on Pattern Analysis and Machine Intelligence. 9 (6): 726–741. doi:10.1109/tpami.1987.4767980. PMID   21869435. S2CID   18352198.
  6. Gauch, J.M.; Pizer, S.M. (1993). "Multiresolution analysis of ridges and valleys in grey-scale images". IEEE Transactions on Pattern Analysis and Machine Intelligence. 15 (6): 635–646. doi:10.1109/34.216734.
  7. Olsen, Ole Fogh; Nielsen, Mads (1997). "Multi-scale gradient magnitude watershed segmentation" (PDF). Image Analysis and Processing. Lecture Notes in Computer Science. Vol. 1310. pp. 6–13. doi:10.1007/3-540-63507-6_178. ISBN   978-3-540-63507-9.
  8. Dam, E., Johansen, P., Olsen, O. Thomsen,, A. Darvann, T. , Dobrzenieck, A., Hermann, N., Kitai, N., Kreiborg, S., Larsen, P., Nielsen, M.: "Interactive multi-scale segmentation in clinical use" in European Congress of Radiology 2000.
  9. Vincken, K.L.; Koster, A.S.E.; Viergever, M.A. (1997). "Probabilistic multiscale image segmentation". IEEE Transactions on Pattern Analysis and Machine Intelligence. 19 (2): 109–120. doi:10.1109/34.574787.
  10. Tabb, M.; Ahuja, N. (1997). "Multiscale image segmentation by integrated edge and region detection". IEEE Transactions on Image Processing. 6 (5): 642–655. Bibcode:1997ITIP....6..642T. doi:10.1109/83.568922. PMID   18282958.
  11. Akbas, Emre; Ahuja, Narendra (2010). "From Ramp Discontinuities to Segmentation Tree". Computer Vision – ACCV 2009. Lecture Notes in Computer Science. Vol. 5994. pp. 123–134. doi:10.1007/978-3-642-12307-8_12. ISBN   978-3-642-12306-1.
  12. Undeman, Carl; Lindeberg, Tony (2003). "Fully Automatic Segmentation of MRI Brain Images Using Probabilistic Anisotropic Diffusion and Multi-scale Watersheds". Scale Space Methods in Computer Vision. Lecture Notes in Computer Science. Vol. 2695. pp. 641–656. doi:10.1007/3-540-44935-3_45. ISBN   978-3-540-40368-5.
  13. Florack, L. M. J.; Kuijper, A. (2000). "The topological structure of scale-space images" (PDF). Journal of Mathematical Imaging and Vision. 12 (1): 65–79. doi:10.1023/A:1008304909717. hdl: 1874/18929 . S2CID   7515494.
  14. Bijaoui, Albert; Rué, Frédéric (1995). "A multiscale vision model adapted to the astronomical images". Signal Processing. 46 (3): 345–362. doi:10.1016/0165-1684(95)00093-4.
  15. Richard F. Lyon. "Speech recognition in scale space," Proc. of 1987 ICASSP. San Diego, March, pp. 29.3.14, 1987.
  16. "Slaney, M. Ponceleon, D., "Hierarchical segmentation using latent semantic indexing in scalespace", Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP '01) 2001" (PDF). Archived from the original (PDF) on 2006-09-19. Retrieved 2006-11-01.

See also