Feature detection |
---|
Edge detection |
Corner detection |
Blob detection |
Ridge detection |
Hough transform |
Structure tensor |
Affine invariant feature detection |
Feature description |
Scale space |
Edge detection includes a variety of mathematical methods that aim at identifying edges, defined as curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection . Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction. [1]
The purpose of detecting sharp changes in image brightness is to capture important events and changes in properties of the world. It can be shown that under rather general assumptions for an image formation model, discontinuities in image brightness are likely to correspond to: [2] [3]
In the ideal case, the result of applying an edge detector to an image may lead to a set of connected curves that indicate the boundaries of objects, the boundaries of surface markings as well as curves that correspond to discontinuities in surface orientation. Thus, applying an edge detection algorithm to an image may significantly reduce the amount of data to be processed and may therefore filter out information that may be regarded as less relevant, while preserving the important structural properties of an image. If the edge detection step is successful, the subsequent task of interpreting the information contents in the original image may therefore be substantially simplified. However, it is not always possible to obtain such ideal edges from real life images of moderate complexity.
Edges extracted from non-trivial images are often hampered by fragmentation, meaning that the edge curves are not connected, missing edge segments as well as false edges not corresponding to interesting phenomena in the image – thus complicating the subsequent task of interpreting the image data. [4]
Edge detection is one of the fundamental steps in image processing, image analysis, image pattern recognition, and computer vision techniques.
The edges extracted from a two-dimensional image of a three-dimensional scene can be classified as either viewpoint dependent or viewpoint independent. A viewpoint independent edge typically reflects inherent properties of the three-dimensional objects, such as surface markings and surface shape. A viewpoint dependent edge may change as the viewpoint changes, and typically reflects the geometry of the scene, such as objects occluding one another.
A typical edge might for instance be the border between a block of red color and a block of yellow. In contrast a line (as can be extracted by a ridge detector) can be a small number of pixels of a different color on an otherwise unchanging background. For a line, there may therefore usually be one edge on each side of the line.
Although certain literature has considered the detection of ideal step edges, the edges obtained from natural images are usually not at all ideal step edges. Instead they are normally affected by one or several of the following effects:
A number of researchers have used a Gaussian smoothed step edge (an error function) as the simplest extension of the ideal step edge model for modeling the effects of edge blur in practical applications. [4] [5] Thus, a one-dimensional image that has exactly one edge placed at may be modeled as:
At the left side of the edge, the intensity is , and right of the edge it is . The scale parameter is called the blur scale of the edge. Ideally this scale parameter should be adjusted based on the quality of image to avoid destroying true edges of the image.[ citation needed ]
Outside of images with simple objects or featuring well-controlled lighting, edge detection is not a trivial task, since it can be difficult to determine what threshold should be used to define an edge between two pixels. [4] For example, in the following one-dimensional signal, most would intuitively say there is an edge between the 4th and 5th pixels:
5 | 7 | 6 | 4 | 152 | 148 | 149 |
However, if the intensity difference between the 4th and the 5th pixels were smaller, it would not be as easy to say that there should be an edge in the corresponding region. Similarly, if the intensity differences between the adjacent neighboring pixels were higher, one could argue that more than one edge should be considered to exist, or even none at all.
5 | 7 | 6 | 61 | 113 | 148 | 149 |
There are many methods for edge detection, but most of them can be grouped into two categories, search-based and zero-crossing based. The search-based methods detect edges by first computing a measure of edge strength, usually a first-order derivative expression such as the gradient magnitude, and then searching for local directional maxima of the gradient magnitude using a computed estimate of the local orientation of the edge, usually the gradient direction. The zero-crossing based methods search for zero crossings in a second-order derivative expression computed from the image in order to find edges, usually the zero-crossings of the Laplacian or the zero-crossings of a non-linear differential expression. As a pre-processing step to edge detection, a smoothing stage, typically Gaussian smoothing, is almost always applied (see also noise reduction).
The edge detection methods that have been published mainly differ in the types of smoothing filters that are applied and the way the measures of edge strength are computed. As many edge detection methods rely on the computation of image gradients, they also differ in the types of filters used for computing gradient estimates in the x- and y-directions.
A survey of a number of different edge detection methods can be found in (Ziou and Tabbone 1998); [6] see also the encyclopedia articles on edge detection in Encyclopedia of Mathematics [3] and Encyclopedia of Computer Science and Engineering. [7]
John Canny considered the mathematical problem of deriving an optimal smoothing filter, given the criteria of detection, localization and minimizing multiple responses to a single edge. [8] He showed that the optimal filter, given these assumptions, is a sum of four exponential terms. He also showed that this filter can be well approximated by first-order derivatives of Gaussians. Canny also introduced the notion of non-maximum suppression, which means that, given the presmoothing filters, edge points are defined as points where the gradient magnitude assumes a local maximum in the gradient direction. Looking for the zero crossing of the 2nd derivative along the gradient direction was first proposed by Haralick. [9] It took less than two decades to find a modern geometric variational meaning for that operator, that links it to the Marr–Hildreth (zero crossing of the Laplacian) edge detector. That observation was presented by Ron Kimmel and Alfred Bruckstein. [10]
Although his work was done in the early days of computer vision, the Canny edge detector (including its variations) is still a state-of-the-art edge detector. [11] Edge detectors that perform better than the Canny usually require longer computation times or a greater number of parameters.
Vladimir A. Kovalevsky [12] has suggested a quite different approach. He uses a preprocessing of the image with the Sigma filter [13] and with a special filter for the dilution of the ramps. This method uses no brightness of the image but only the intensities of the color channels which is important for detecting an edge between two adjacent pixels of equal brightness but different colors. The method scans the image two times: first along the horizontal lines and second along the vertical columns. In each horizontal line six consequent adjacent pixels are considered and five color difference between each two adjacent pixels are calculated. Each color difference is the sum of absolute differences of the intensities of the color channels Red, Green, and Blue of the corresponding adjacent pixels. If this sum is greater than a given threshold, then the sign of the color difference is set equal to the sign of the difference of the green intensities. If the green difference is zero, then the sign of the color difference is set equal to the sign of the difference of the red intensities. If, however, both the green and the red differences are zero, then the sign of the color difference is set equal to the sign of the blue difference which in this case cannot be zero since the sum is greater than the threshold. Certain conditions for the values and signs of the five color differences are specified in such way that if the conditions are fulfilled, then a short vertical stroke is put between the third and the fourth of the six pixels as the label of the edge. Similar calculations are performed for the vertical columns. In this case a short horizontal stroke is put between the third and the fourth of the six subsequent pixels. The vertical and horizontal strokes (being the one-dimensional cells of an abstract cell complex corresponding to the image) mostly compose a connected sequence representing the edge. This method is robust and very fast and, what is more important, it can detect edges between adjacent pixels of equal brightness’s if the color difference between these pixels is greater than the threshold.
The Canny–Deriche detector was derived from similar mathematical criteria as the Canny edge detector, although starting from a discrete viewpoint and then leading to a set of recursive filters for image smoothing instead of exponential filters or Gaussian filters. [14]
The differential edge detector described below can be seen as a reformulation of Canny's method from the viewpoint of differential invariants computed from a scale space representation leading to a number of advantages in terms of both theoretical analysis and sub-pixel implementation. In that aspect, Log Gabor filter have been shown to be a good choice to extract boundaries in natural scenes. [15]
Different gradient operators can be applied to estimate image gradients from the input image or a smoothed version of it. The simplest approach is to use central differences:
corresponding to the application of the following filter masks to the image data:
The well-known and earlier Sobel operator is based on the following filters:
Given such estimates of first-order image derivatives, the gradient magnitude is then computed as:
while the gradient orientation can be estimated as
Other first-order difference operators for estimating image gradient have been proposed in the Prewitt operator, Roberts cross, Kayyali [16] operator and Frei–Chen operator.
It is possible to extend filters dimension to avoid the issue of recognizing edge in low SNR image. The cost of this operation is loss in terms of resolution. Examples are Extended Prewitt 7×7.
Once we have computed a measure of edge strength (typically the gradient magnitude), the next stage is to apply a threshold, to decide whether edges are present or not at an image point. The lower the threshold, the more edges will be detected, and the result will be increasingly susceptible to noise and detecting edges of irrelevant features in the image. Conversely a high threshold may miss subtle edges, or result in fragmented edges.
If the edge is applied to just the gradient magnitude image, the resulting edges will in general be thick and some type of edge thinning post-processing is necessary. For edges detected with non-maximum suppression however, the edge curves are thin by definition and the edge pixels can be linked into edge polygon by an edge linking (edge tracking) procedure. On a discrete grid, the non-maximum suppression stage can be implemented by estimating the gradient direction using first-order derivatives, then rounding off the gradient direction to multiples of 45 degrees, and finally comparing the values of the gradient magnitude in the estimated gradient direction.
A commonly used approach to handle the problem of appropriate thresholds for thresholding is by using thresholding with hysteresis. This method uses multiple thresholds to find edges. We begin by using the upper threshold to find the start of an edge. Once we have a start point, we then trace the path of the edge through the image pixel by pixel, marking an edge whenever we are above the lower threshold. We stop marking our edge only when the value falls below our lower threshold. This approach makes the assumption that edges are likely to be in continuous curves, and allows us to follow a faint section of an edge we have previously seen, without meaning that every noisy pixel in the image is marked down as an edge. Still, however, we have the problem of choosing appropriate thresholding parameters, and suitable thresholding values may vary over the image.
This method finds connected set of pixels having a directional derivative magnitude larger than a fairly small threshold. [17] It considers only presence of gradients instead of strength of gradients. After applying a dramatically small threshold (i.e. 5), a binary image is obtained. The morphological opening operation and the morphological closing operation are applied to the binary image to close gaps. Then, the distance transform operation is applied to the binary image to clear the pixels far from the background, so blob-like shapes or other false labeled regions are deleted from the edge map.
Edge thinning is a technique used to remove the unwanted spurious points on the edges in an image. This technique is employed after the image has been filtered for noise (using median, Gaussian filter etc.), the edge operator has been applied (like the ones described above, Canny or Sobel) to detect the edges and after the edges have been smoothed using an appropriate threshold value. This removes all the unwanted points, and if applied carefully, results in one pixel thick edge elements.
Advantages:
There are many popular algorithms used to do this, one such is described below:
The number of passes across direction should be chosen according to the level of accuracy desired.
Some edge-detection operators are instead based upon second-order derivatives of the intensity. This essentially captures the rate of change in the intensity gradient. Thus, in the ideal continuous case, detection of zero-crossings in the second derivative captures local maxima in the gradient.
The early Marr–Hildreth operator is based on the detection of zero-crossings of the Laplacian operator applied to a Gaussian-smoothed image. It can be shown, however, that this operator will also return false edges corresponding to local minima of the gradient magnitude. Moreover, this operator will give poor localization at curved edges. Hence, this operator is today mainly of historical interest.
A more refined second-order edge detection approach which automatically detects edges with sub-pixel accuracy, uses the following differential approach of detecting zero-crossings of the second-order directional derivative in the gradient direction:
Following the differential geometric way of expressing the requirement of non-maximum suppression proposed by Lindeberg, [4] [18] let us introduce at every image point a local coordinate system , with the -direction parallel to the gradient direction. Assuming that the image has been pre-smoothed by Gaussian smoothing and a scale space representation at scale has been computed, we can require that the gradient magnitude of the scale space representation, which is equal to the first-order directional derivative in the -direction , should have its first order directional derivative in the -direction equal to zero
while the second-order directional derivative in the -direction of should be negative, i.e.,
Written out as an explicit expression in terms of local partial derivatives , this edge definition can be expressed as the zero-crossing curves of the differential invariant
that satisfies a sign-condition on the following differential invariant
where denote partial derivatives computed from a scale space representation obtained by smoothing the original image with a Gaussian kernel. In this way, the edges will be automatically obtained as continuous curves with sub-pixel accuracy. Hysteresis thresholding can also be applied to these differential and subpixel edge segments.
In practice, first-order derivative approximations can be computed by central differences as described above, while second-order derivatives can be computed from the scale space representation according to:
corresponding to the following filter masks:
Higher-order derivatives for the third-order sign condition can be obtained in an analogous fashion.
A recent development in edge detection techniques takes a frequency domain approach to finding edge locations. Phase congruency (also known as phase coherence) methods attempt to find locations in an image where all sinusoids in the frequency domain are in phase. These locations will generally correspond to the location of a perceived edge, regardless of whether the edge is represented by a large change in intensity in the spatial domain. A key benefit of this technique is that it responds strongly to Mach bands, and avoids false positives typically found around roof edges. A roof edge, is a discontinuity in the first order derivative of a grey-level profile. [19]
The phase stretch transform or PST is a physics-inspired computational approach to signal and image processing. One of its utilities is for feature detection and classification. [20] [21] PST is a spin-off from research on the time stretch dispersive Fourier transform. PST transforms the image by emulating propagation through a diffractive medium with engineered 3D dispersive property (refractive index). The operation relies on symmetry of the dispersion profile and can be understood in terms of dispersive eigenfunctions or stretch modes. [22] PST performs similar functionality as phase contrast microscopy but on digital images. PST is also applicable to digital images as well as temporal, time series, data.
To increase the precision of edge detection, several subpixel techniques had been proposed, including curve-fitting, moment-based, [23] [24] reconstructive, and partial area effect methods. [25] These methods have different characteristics. Curve fitting methods are computationally simple but are easily affected by noise. Moment-based methods use an integral-based approach to reduce the effect of noise, but may require more computations in some cases. Reconstructive methods use horizontal gradients or vertical gradients to build a curve and find the peak of the curve as the sub-pixel edge. Partial area effect methods are based on the hypothesis that each pixel value depends on the area at both sides of the edge inside that pixel, producing accurate individual estimation for every edge pixel. Certain variants of the moment-based technique have been shown to be the most accurate for isolated edges. [24]
The Marr-Hildreth edge detector [26] is distinguished by its use of the Laplacian of Gaussian (LoG) operator for edge detection in digital images. Unlike other edge detection methods, the LoG approach combines Gaussian smoothing with second derivative operations, allowing for simultaneous noise reduction and edge enhancement. The key advantage of this method lies in its ability to detect edges at various scales by adjusting the standard deviation of the Gaussian kernel, enabling detection of fine details as well as broader transitions. Moreover, the technique leverages zero-crossing detection on the LoG response to precisely locate edges, offering robustness against noise and maintaining edge continuity. This approach is particularly effective for detecting edges with clear boundaries in images while minimizing false positives due to noise, making it a valuable tool in computer vision applications where accurate edge localization is crucial.
Source: [27]
% MATLAB code for prewitt% operator edge detectionk=imread("logo.png");k=rgb2gray(k);k1=double(k);p_msk=[-101;-101;-101];kx=conv2(k1,p_msk,'same');ky=conv2(k1,p_msk','same');ked=sqrt(kx.^2+ky.^2);% display the images.imtool(k,[]);% display the edge detection along x-axis.imtool(abs(kx),[]);% display the edge detection along y-axis.imtool(abs(ky),[]);% display the full edge detection.imtool(abs(ked),[]);
%Scharr operator -> edge detectionk=imread("logo.png");k=rgb2gray(k);k1=double(k);s_msk=[-303;-10010;-303];kx=conv2(k1,s_msk,'same');ky=conv2(k1,s_msk','same');ked=sqrt(kx.^2+ky.^2);%display the images.imtool(k,[]);%display the edge detection along x-axis.imtool(abs(kx),[]);%display the edge detection along y-axis.imtool(abs(ky),[]);%display the full edge detection.imtool(abs(ked),[]);
% MATLAB code for Sobel operator% edge detectionk=imread("logo.png");k=rgb2gray(k);k1=double(k);s_msk=[-101;-202;-101];kx=conv2(k1,s_msk,'same');ky=conv2(k1,s_msk','same');ked=sqrt(kx.^2+ky.^2);%display the images.imtool(k,[]);%display the edge detection along x-axis.imtool(abs(kx),[]);%display the edge detection along y-axis.imtool(abs(ky),[]);%display the full edge detection.imtool(abs(ked),[]);
The Sobel operator, sometimes called the Sobel–Feldman operator or Sobel filter, is used in image processing and computer vision, particularly within edge detection algorithms where it creates an image emphasising edges. It is named after Irwin Sobel and Gary M. Feldman, colleagues at the Stanford Artificial Intelligence Laboratory (SAIL). Sobel and Feldman presented the idea of an "Isotropic 3 × 3 Image Gradient Operator" at a talk at SAIL in 1968. Technically, it is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Sobel–Feldman operator is either the corresponding gradient vector or the norm of this vector. The Sobel–Feldman operator is based on convolving the image with a small, separable, and integer-valued filter in the horizontal and vertical directions and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient approximation that it produces is relatively crude, in particular for high-frequency variations in the image.
The Roberts cross operator is used in image processing and computer vision for edge detection. It was one of the first edge detectors and was initially proposed by Lawrence Roberts in 1963. As a differential operator, the idea behind the Roberts cross operator is to approximate the gradient of an image through discrete differentiation which is achieved by computing the sum of the squares of the differences between diagonally adjacent pixels.
The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
In computer vision, the Marr–Hildreth algorithm is a method of detecting edges in digital images, that is, continuous curves where there are strong and rapid variations in image brightness. The Marr–Hildreth edge detection method is simple and operates by convolving the image with the Laplacian of the Gaussian function, or, as a fast approximation by difference of Gaussians. Then, zero crossings are detected in the filtered result to obtain the edges. The Laplacian-of-Gaussian image operator is sometimes also referred to as the Mexican hat wavelet due to its visual shape when turned upside-down. David Marr and Ellen C. Hildreth are two of the inventors.
The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.
In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .
In image processing, a Gaussian blur is the result of blurring an image by a Gaussian function.
The Prewitt operator is used in image processing, particularly within edge detection algorithms. Technically, it is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Prewitt operator is either the corresponding gradient vector or the norm of this vector. The Prewitt operator is based on convolving the image with a small, separable, and integer valued filter in horizontal and vertical directions and is therefore relatively inexpensive in terms of computations like Sobel and Kayyali operators. On the other hand, the gradient approximation which it produces is relatively crude, in particular for high frequency variations in the image. The Prewitt operator was developed by Judith M. S. Prewitt.
The median filter is a non-linear digital filtering technique, often used to remove noise from an image, signal, and video. Such noise reduction is a typical pre-processing step to improve the results of later processing. Median filtering is very widely used in digital image processing because, under certain conditions, it preserves edges while removing noise, also having applications in signal processing.
An image gradient is a directional change in the intensity or color in an image. The gradient of the image is one of the fundamental building blocks in image processing. For example, the Canny edge detector uses image gradient for edge detection. In graphics software for digital image editing, the term gradient or color gradient is also used for a gradual blend of color which can be considered as an even gradation from low to high values, and seen from black to white in the images to the right. Another name for this is color progression.
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.
In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges.
In mathematics, the structure tensor, also referred to as the second-moment matrix, is a matrix derived from the gradient of a function. It describes the distribution of the gradient in a specified neighborhood around a point and makes the information invariant to the observing coordinates. The structure tensor is often used in image processing and computer vision.
In computer vision, speeded up robust features (SURF) is a local feature detector and descriptor, with patented applications. It can be used for tasks such as object recognition, image registration, classification, or 3D reconstruction. It is partly inspired by the scale-invariant feature transform (SIFT) descriptor. The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT.
In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.
The principal curvature-based region detector, also called PCBR is a feature detector used in the fields of computer vision and image analysis. Specifically the PCBR detector is designed for object recognition applications.
Deriche edge detector is an edge detection operator developed by Rachid Deriche in 1987. It is a multistep algorithm used to obtain an optimal result of edge detection in a discrete two-dimensional image. This algorithm is based on John F. Canny's work related to the edge detection and his criteria for optimal edge detection:
Integral Channel Features (ICF), also known as ChnFtrs, is a method for object detection in computer vision. It uses integral images to extract features such as local sums, histograms and Haar-like features from multiple registered image channels. This method was highly exploited by Dollár et al. in their work for pedestrian detection, that was first described at the BMVC in 2009.
M. Gupta, S.N. Tazi, A. Jain and Deepika, "Edge Detection Using Modified Firefly Algorithm", Computational Intelligence and Communication Networks (CICN) 2014 International onference on, pp. 167-173, 14-16 Nov. 2014. View Article Google Scholar