Feature (computer vision)

Last updated

In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

Contents

More broadly a feature is any piece of information that is relevant for solving the computational task related to a certain application. This is the same sense as feature in machine learning and pattern recognition generally, though image processing has a very sophisticated collection of features. The feature concept is very general and the choice of features in a particular computer vision system may be highly dependent on the specific problem at hand.

Definition

There is no universal or exact definition of what constitutes a feature, and the exact definition often depends on the problem or the type of application. Nevertheless, a feature is typically defined as an "interesting" part of an image, and features are used as a starting point for many computer vision algorithms.

Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. Consequently, the desirable property for a feature detector is repeatability : whether or not the same feature will be detected in two or more different images of the same scene.

Feature detection is a low-level image processing operation. That is, it is usually performed as the first operation on an image and examines every pixel to see if there is a feature present at that pixel. If this is part of a larger algorithm, then the algorithm will typically only examine the image in the region of the features. As a built-in pre-requisite to feature detection, the input image is usually smoothed by a Gaussian kernel in a scale-space representation and one or several feature images are computed, often expressed in terms of local image derivative operations.

Occasionally, when feature detection is computationally expensive and there are time constraints, a higher-level algorithm may be used to guide the feature detection stage so that only certain parts of the image are searched for features.

There are many computer vision algorithms that use feature detection as the initial step, so as a result, a very large number of feature detectors have been developed. These vary widely in the kinds of feature detected, the computational complexity and the repeatability.

When features are defined in terms of local neighborhood operations applied to an image, a procedure commonly referred to as feature extraction, one can distinguish between feature detection approaches that produce local decisions whether there is a feature of a given type at a given image point or not, and those who produce non-binary data as result. The distinction becomes relevant when the resulting detected features are relatively sparse. Although local decisions are made, the output from a feature detection step does not need to be a binary image. The result is often represented in terms of sets of (connected or unconnected) coordinates of the image points where features have been detected, sometimes with subpixel accuracy.

When feature extraction is done without local decision making, the result is often referred to as a feature image. Consequently, a feature image can be seen as an image in the sense that it is a function of the same spatial (or temporal) variables as the original image, but where the pixel values hold information about image features instead of intensity or color. This means that a feature image can be processed in a similar way as an ordinary image generated by an image sensor. Feature images are also often computed as integrated step in algorithms for feature detection.

Feature vectors and feature spaces

In some applications, it is not sufficient to extract only one type of feature to obtain the relevant information from the image data. Instead, two or more different features are extracted, resulting in two or more feature descriptors at each image point. A common practice is to organize the information provided by all these descriptors as the elements of one single vector, commonly referred to as a feature vector. The set of all possible feature vectors constitutes a feature space. [1]

A common example of feature vectors appears when each image point is to be classified as belonging to a specific class. Assuming that each image point has a corresponding feature vector based on a suitable set of features, meaning that each class is well separated in the corresponding feature space, the classification of each image point can be done using standard classification method.

Simplified neural network training example.svg
Simplified example of training a neural network in object detection: The network is trained by multiple images that are known to depict starfish and sea urchins, which are correlated with "nodes" that represent visual features. The starfish match with a ringed texture and a star outline, whereas most sea urchins match with a striped texture and oval shape. However, the instance of a ring textured sea urchin creates a weakly weighted association between them.
Simplified neural network example.svg
Subsequent run of the network on an input image (left): [2] The network correctly detects the starfish. However, the weakly weighted association between ringed texture and sea urchin also confers a weak signal to the latter from one of two features. In addition, a shell that was not included in the training gives a weak signal for the oval shape, also resulting in a weak signal for the sea urchin output. These weak signals may result in a false positive result for sea urchin.
In reality, textures and outlines would not be represented by single nodes, but rather by associated weight patterns of multiple nodes.

Another and related example occurs when neural network-based processing is applied to images. The input data fed to the neural network is often given in terms of a feature vector from each image point, where the vector is constructed from several different features extracted from the image data. During a learning phase, the network can itself find which combinations of different features are useful for solving the problem at hand.

Types

Edges

Edges are points where there is a boundary (or an edge) between two image regions. In general, an edge can be of almost arbitrary shape, and may include junctions. In practice, edges are usually defined as sets of points in the image that have a strong gradient magnitude. Furthermore, some common algorithms will then chain high gradient points together to form a more complete description of an edge. These algorithms usually place some constraints on the properties of an edge, such as shape, smoothness, and gradient value.

Locally, edges have a one-dimensional structure.

Corners/interest points

The terms corners and interest points are used somewhat interchangeably and refer to point-like features in an image, which have a local two-dimensional structure. The name "Corner" arose since early algorithms first performed edge detection, and then analyzed the edges to find rapid changes in direction (corners). These algorithms were then developed so that explicit edge detection was no longer required, for instance by looking for high levels of curvature in the image gradient. It was then noticed that the so-called corners were also being detected on parts of the image that were not corners in the traditional sense (for instance a small bright spot on a dark background may be detected). These points are frequently known as interest points, but the term "corner" is used by tradition[ citation needed ].

Blobs / regions of interest points

Blobs provide a complementary description of image structures in terms of regions, as opposed to corners that are more point-like. Nevertheless, blob descriptors may often contain a preferred point (a local maximum of an operator response or a center of gravity) which means that many blob detectors may also be regarded as interest point operators. Blob detectors can detect areas in an image that are too smooth to be detected by a corner detector.

Consider shrinking an image and then performing corner detection. The detector will respond to points that are sharp in the shrunk image, but may be smooth in the original image. It is at this point that the difference between a corner detector and a blob detector becomes somewhat vague. To a large extent, this distinction can be remedied by including an appropriate notion of scale. Nevertheless, due to their response properties to different types of image structures at different scales, the LoG and DoH blob detectors are also mentioned in the article on corner detection.

Ridges

For elongated objects, the notion of ridges is a natural tool. A ridge descriptor computed from a grey-level image can be seen as a generalization of a medial axis. From a practical viewpoint, a ridge can be thought of as a one-dimensional curve that represents an axis of symmetry, and in addition has an attribute of local ridge width associated with each ridge point. Unfortunately, however, it is algorithmically harder to extract ridge features from general classes of grey-level images than edge-, corner- or blob features. Nevertheless, ridge descriptors are frequently used for road extraction in aerial images and for extracting blood vessels in medical images—see ridge detection.

Detection

Writing Desk with Harris Detector.png

Feature detection includes methods for computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not. The resulting features will be subsets of the image domain, often in the form of isolated points, continuous curves or connected regions.

The extraction of features are sometimes made over several scalings. One of these methods is the scale-invariant feature transform (SIFT).

Common feature detectors and their classification:
Feature detector Edge Corner Blob Ridge
Canny [3] YesNoNoNo
Sobel YesNoNoNo
Harris & Stephens / Plessey [4] YesYesNoNo
SUSAN [5] YesYesNoNo
Shi & Tomasi [6] NoYesNoNo
Level curve curvature [7] NoYesNoNo
FAST [8] NoYesNoNo
Laplacian of Gaussian [7] NoYesYesNo
Difference of Gaussians [9] [10] NoYesYesNo
Determinant of Hessian [7] NoYesYesNo
Hessian strength feature measures [11] [12] NoYesYesNo
MSER [13] NoNoYesNo
Principal curvature ridges [14] [15] [16] NoNoNoYes
Grey-level blobs [17] NoNoYesNo

Extraction

Once features have been detected, a local image patch around the feature can be extracted. This extraction may involve quite considerable amounts of image processing. The result is known as a feature descriptor or feature vector. Among the approaches that are used to feature description, one can mention N-jets and local histograms (see scale-invariant feature transform for one example of a local histogram descriptor). In addition to such attribute information, the feature detection step by itself may also provide complementary attributes, such as the edge orientation and gradient magnitude in edge detection and the polarity and the strength of the blob in blob detection.

Low-level

Curvature

Image motion

Shape based

Flexible methods

Representation

A specific image feature, defined in terms of a specific structure in the image data, can often be represented in different ways. For example, an edge can be represented as a Boolean variable in each image point that describes whether an edge is present at that point. Alternatively, we can instead use a representation that provides a certainty measure instead of a Boolean statement of the edge's existence and combine this with information about the orientation of the edge. Similarly, the color of a specific region can either be represented in terms of the average color (three scalars) or a color histogram (three functions).

When a computer vision system or computer vision algorithm is designed the choice of feature representation can be a critical issue. In some cases, a higher level of detail in the description of a feature may be necessary for solving the problem, but this comes at the cost of having to deal with more data and more demanding processing. Below, some of the factors which are relevant for choosing a suitable representation are discussed. In this discussion, an instance of a feature representation is referred to as a feature descriptor, or simply descriptor.

Certainty or confidence

Two examples of image features are local edge orientation and local velocity in an image sequence. In the case of orientation, the value of this feature may be more or less undefined if more than one edge are present in the corresponding neighborhood. Local velocity is undefined if the corresponding image region does not contain any spatial variation. As a consequence of this observation, it may be relevant to use a feature representation that includes a measure of certainty or confidence related to the statement about the feature value. Otherwise, it is a typical situation that the same descriptor is used to represent feature values of low certainty and feature values close to zero, with a resulting ambiguity in the interpretation of this descriptor. Depending on the application, such an ambiguity may or may not be acceptable.

In particular, if a featured image will be used in subsequent processing, it may be a good idea to employ a feature representation that includes information about certainty or confidence. This enables a new feature descriptor to be computed from several descriptors, for example, computed at the same image point but at different scales, or from different but neighboring points, in terms of a weighted average where the weights are derived from the corresponding certainties. In the simplest case, the corresponding computation can be implemented as a low-pass filtering of the featured image. The resulting feature image will, in general, be more stable to noise.

Averageability

In addition to having certainty measures included in the representation, the representation of the corresponding feature values may itself be suitable for an averaging operation or not. Most feature representations can be averaged in practice, but only in certain cases can the resulting descriptor be given a correct interpretation in terms of a feature value. Such representations are referred to as averageable.

For example, if the orientation of an edge is represented in terms of an angle, this representation must have a discontinuity where the angle wraps from its maximal value to its minimal value. Consequently, it can happen that two similar orientations are represented by angles that have a mean that does not lie close to either of the original angles and, hence, this representation is not averageable. There are other representations of edge orientation, such as the structure tensor, which are averageable.

Another example relates to motion, where in some cases only the normal velocity relative to some edge can be extracted. If two such features have been extracted and they can be assumed to refer to same true velocity, this velocity is not given as the average of the normal velocity vectors. Hence, normal velocity vectors are not averageable. Instead, there are other representations of motions, using matrices or tensors, that give the true velocity in terms of an average operation of the normal velocity descriptors.[ citation needed ]

Matching

Features detected in each image can be matched across multiple images to establish corresponding features such as corresponding points.

The algorithm is based on comparing and analyzing point correspondences between the reference image and the target image. If any part of the cluttered scene shares correspondences greater than the threshold, that part of the cluttered scene image is targeted and considered to include the reference object there. [18]

See also

Related Research Articles

Edge detection includes a variety of mathematical methods that aim at identifying edges, defined as curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection. Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction.

<span class="mw-page-title-main">Canny edge detector</span> Image edge detection algorithm

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .

<span class="mw-page-title-main">Corner detection</span> Approach used in computer vision systems

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

<span class="mw-page-title-main">Scale-space segmentation</span>

Scale-space segmentation or multi-scale segmentation is a general framework for signal and image segmentation, based on the computation of image descriptors at multiple scales of smoothing.

In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges.

In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other. The most common method for blob detection is by using convolution.

Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shape adaptation can be accomplished by iteratively warping a local image patch with affine transformations while applying a rotationally symmetric filter to the warped image patches. Provided that this iterative process converges, the resulting fixed point will be affine invariant. In the area of computer vision, this idea has been used for defining affine invariant interest point operators as well as affine invariant texture analysis methods.

In mathematics, the structure tensor, also referred to as the second-moment matrix, is a matrix derived from the gradient of a function. It describes the distribution of the gradient in a specified neighborhood around a point and makes the information invariant to the observing coordinates. The structure tensor is often used in image processing and computer vision.

In computer vision, speeded up robust features (SURF) is a patented local feature detector and descriptor. It can be used for tasks such as object recognition, image registration, classification, or 3D reconstruction. It is partly inspired by the scale-invariant feature transform (SIFT) descriptor. The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT.

The Kadir–Brady saliency detector extracts features of objects in images that are distinct and representative. It was invented by Timor Kadir and J. Michael Brady in 2001 and an affine invariant version was introduced by Kadir and Brady in 2004 and a robust version was designed by Shao et al. in 2007.

The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

The Hessian affine region detector is a feature detector used in the fields of computer vision and image analysis. Like other feature detectors, the Hessian affine detector is typically used as a preprocessing step to algorithms that rely on identifiable, characteristic interest points.

In computer vision, 3D object recognition involves recognizing and determining 3D information, such as the pose, volume, or shape, of user-chosen 3D objects in a photograph or range scan. Typically, an example of the object to be recognized is presented to a vision system in a controlled environment, and then for an arbitrary input such as a video stream, the system locates the previously presented object. This can be done either off-line, or in real-time. The algorithms for solving this problem are specialized for locating a single pre-identified object, and can be contrasted with algorithms which operate on general classes of objects, such as face recognition systems or 3D generic object recognition. Due to the low cost and ease of acquiring photographs, a significant amount of research has been devoted to 3D object recognition in photographs.

In computer vision, maximally stable extremal regions (MSER) technique is used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements taken from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

The principal curvature-based region detector, also called PCBR is a feature detector used in the fields of computer vision and image analysis. Specifically the PCBR detector is designed for object recognition applications.

Geometric feature learning is a technique combining machine learning and computer vision to solve visual tasks. The main goal of this method is to find a set of representative features of geometric form to represent an object by collecting geometric features from images and learning them using efficient machine learning methods. Humans solve visual tasks and can give fast response to the environment by extracting perceptual information from what they see. Researchers simulate humans' ability of recognizing objects to solve computer vision problems. For example, M. Mata et al.(2002) applied feature learning techniques to the mobile robot navigation tasks in order to avoid obstacles. They used genetic algorithms for learning features and recognizing objects (figures). Geometric feature learning methods can not only solve recognition problems but also predict subsequent actions by analyzing a set of sequential input sensory images, usually some extracting features of images. Through learning, some hypothesis of the next action are given and according to the probability of each hypothesis give a most probable action. This technique is widely used in the area of artificial intelligence.

Chessboards arise frequently in computer vision theory and practice because their highly structured geometry is well-suited for algorithmic detection and processing. The appearance of chessboards in computer vision can be divided into two main areas: camera calibration and feature extraction. This article provides a unified discussion of the role that chessboards play in the canonical methods from these two areas, including references to the seminal literature, examples, and pointers to software implementations.

References

  1. Scott E Umbaugh (27 January 2005). Computer Imaging: Digital Image Analysis and Processing. CRC Press. ISBN   978-0-8493-2919-7.
  2. Ferrie, C., & Kaiser, S. (2019). Neural Networks for Babies. Sourcebooks. ISBN   1492671207.{{cite book}}: CS1 maint: multiple names: authors list (link)
  3. Canny, J. (1986). "A Computational Approach To Edge Detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 8 (6): 679–714. doi:10.1109/TPAMI.1986.4767851. PMID   21869365. S2CID   13284142.
  4. C. Harris; M. Stephens (1988). "A combined corner and edge detector" (PDF). Proceedings of the 4th Alvey Vision Conference. pp. 147–151. Archived from the original (PDF) on 2022-04-01. Retrieved 2021-02-11.
  5. S. M. Smith; J. M. Brady (May 1997). "SUSAN - a new approach to low level image processing". International Journal of Computer Vision. 23 (1): 45–78. doi:10.1023/A:1007963824710. S2CID   15033310.
  6. J. Shi; C. Tomasi (June 1994). "Good Features to Track". 9th IEEE Conference on Computer Vision and Pattern Recognition. Springer.
  7. 1 2 3 T. Lindeberg (1998). "Feature detection with automatic scale selection" (abstract). International Journal of Computer Vision. 30 (2): 77–116. doi:10.1023/A:1008045108935. S2CID   723210.
  8. E. Rosten; T. Drummond (2006). "Machine learning for high-speed corner detection". European Conference on Computer Vision. Springer. pp. 430–443. CiteSeerX   10.1.1.60.3991 . doi:10.1007/11744023_34.
  9. J. L. Crowley and A. C. Parker, "A Representation for Shape Based on Peaks and Ridges in the Difference of Low Pass Transform [ dead link ]", IEEE Transactions on PAMI, PAMI 6 (2), pp. 156–170, March 1984.
  10. D. Lowe (2004). "Distinctive Image Features from Scale-Invariant Keypoints". International Journal of Computer Vision. 60 (2): 91. CiteSeerX   10.1.1.73.2924 . doi:10.1023/B:VISI.0000029664.99615.94. S2CID   221242327.
  11. T. Lindeberg "Scale selection properties of generalized scale-space interest point detectors", Journal of Mathematical Imaging and Vision, Volume 46, Issue 2, pages 177-210, 2013.
  12. T. Lindeberg ``Image matching using generalized scale-space interest points", Journal of Mathematical Imaging and Vision, volume 52, number 1, pages 3-36, 2015.
  13. J. Matas; O. Chum; M. Urban; T. Pajdla (2002). "Robust wide baseline stereo from maximally stable extremum regions" (PDF). British Machine Vision Conference. pp. 384–393.
  14. R. Haralick, "Ridges and Valleys on Digital Images", Computer Vision, Graphics, and Image Processing vol. 22, no. 10, pp. 28–38, Apr. 1983.
  15. D. Eberly, R. Gardner, B. Morse, S. Pizer, C. Scharlach, Ridges for image analysis, Journal of Mathematical Imaging and Vision, v. 4 n. 4, pp. 353–373, Dec. 1994.
  16. T. Lindeberg (1998). "Edge detection and ridge detection with automatic scale selection" (abstract). International Journal of Computer Vision. 30 (2): 117–154. doi:10.1023/A:1008097225773. S2CID   207658261.
  17. T. Lindeberg (1993). "Detecting Salient Blob-Like Image Structures and Their Scales with a Scale-Space Primal Sketch: A Method for Focus-of-Attention" (abstract). International Journal of Computer Vision. 11 (3): 283–318. doi:10.1007/BF01469346. S2CID   11998035.
  18. "Object Detection in a Cluttered Scene Using Point Feature Matching - MATLAB & Simulink". www.mathworks.com. Retrieved 2019-07-06.

Further reading