Generalised Hough transform

Last updated

The generalized Hough transform (GHT), introduced by Dana H. Ballard in 1981, is the modification of the Hough transform using the principle of template matching. [1] The Hough transform was initially developed to detect analytically defined shapes (e.g., line, circle, ellipse etc.). In these cases, we have knowledge of the shape and aim to find out its location and orientation in the image. This modification enables the Hough transform to be used to detect an arbitrary object described with its model.

Contents

The problem of finding the object (described with a model) in an image can be solved by finding the model's position in the image. With the generalized Hough transform, the problem of finding the model's position is transformed to a problem of finding the transformation's parameter that maps the model into the image. Given the value of the transformation's parameter, the position of the model in the image can be determined.

The original implementation of the GHT used edge information to define a mapping from orientation of an edge point to a reference point of the shape. In the case of a binary image where pixels can be either black or white, every black pixel of the image can be a black pixel of the desired pattern thus creating a locus of reference points in the Hough space. Every pixel of the image votes for its corresponding reference points. The maximum points of the Hough space indicate possible reference points of the pattern in the image. This maximum can be found by scanning the Hough space or by solving a relaxed set of equations, each of them corresponding to a black pixel. [2]

History

Merlin and Farber [3] showed how to use a Hough algorithm when the desired curves could not be described analytically. It was a precursor to Ballard's algorithm that was restricted to translation and did not account for rotation and scale changes. [4]

The Merlin-Farber algorithm is impractical for real image data as in an image with many edge pixels, it finds many false positives due to repetitive pixel arrangements.

Theory of generalized Hough transform

To generalize the Hough algorithm to non-analytic curves, Ballard defines the following parameters for a generalized shape: a={y,s,θ} where y is a reference origin for the shape, θ is its orientation, and s = (sx, sy) describes two orthogonal scale factors. An algorithm can compute the best set of parameters for a given shape from edge pixel data. These parameters do not have equal status. The reference origin location, y, is described in terms of a template table called the R table of possible edge pixel orientations. The computation of the additional parameters s and θ is then accomplished by straightforward transformations to this table. The key generalization to arbitrary shapes is the use of directional information. Given any shape and a fixed reference point on it, instead of a parametric curve, the information provided by the boundary pixels is stored in the form of the R-table in the transform stage. For every edge point on the test image, the properties of the point are looked up on the R-table and reference point is retrieved and the appropriate cell in a matrix called the Accumulator matrix is incremented. The cell with maximum 'votes' in the Accumulator matrix can be a possible point of existence of fixed reference of the object in the test image.

Building the R-table

Choose a reference point y for the shape (typically chosen inside the shape). For each boundary point x, compute ɸ(x), the gradient direction and r = y – x as shown in the image. Store r as a function of ɸ. Notice that each index of ɸ may have many values of r. One can either store the co-ordinate differences between the fixed reference and the edge point ((xc – xij), (yc – yij)) or as the radial distance and the angle between them (rij, αij). Having done this for each point, the R-table will fully represent the template object. Also, since the generation phase is invertible, we may use it to localise object occurrences at other places in the image.

Geometry of shape detection for generalized Hough transform Geometry of shape detection for generalised hough transform.jpg
Geometry of shape detection for generalized Hough transform
iɸiRɸi
10(r11, α11) (r12, α12)... (r1n, α1n)
2Δɸ(r21, α21) (r22, α22)... (r2m, α2m)
32Δɸ(r31, α31) (r32, α32)... (r3k, α3k)
.........

Object localization

For each edge pixel x in the image, find the gradient ɸ and increment all the corresponding points x+r in the accumulator array A (initialized to a maximum size of the image) where r is a table entry indexed by ɸ, i.e., r(ɸ). These entry points give us each possible position for the reference point. Although some bogus points may be calculated, given that the object exists in the image, a maximum will occur at the reference point. Maxima in A correspond to possible instances of the shape.

Generalization of scale and orientation

For a fixed orientation of shape, the accumulator array was two-dimensional in the reference point co-ordinates. To search for shapes of arbitrary orientation θ and scale s, these two parameters are added to the shape description. The accumulator array now consists of four dimensions corresponding to the parameters (y, s, θ). The R-table can also be used to increment this larger dimensional space since different orientations and scales correspond to easily computed transformations of the table. Denote a particular R-table for a shape S by R(ɸ). Simple transformations to this table will allow it to detect scaled or rotated instances of the same shape. For example, if the shape is scaled by s and this transformation is denoted by Ts. then Ts[R(ɸ)] = sR(ɸ) i.e., all the vectors are scaled by s. Also, if the object is rotated by θ and this transformation is denoted by Tθ, then Tθ[R(ɸ)] = Rot{R[(ɸ-θ)mod2π],θ} i.e., all the indices are incremented by – θ modulo 2π, the appropriate vectors r are found, and then they are rotated by θ. Another property which will be useful in describing the composition of generalized Hough transforms is the change of reference point. If we want to choose a new reference point such that y-ỹ = z then the modification to the R-table is given by R(ɸ)+ z, i.e. z is added to each vector in the table.

Alternate way using pairs of edges

A pair of edge pixels can be used to reduce the parameter space. Using the R-table and the properties as described above, each edge pixel defines a surface in the four-dimensional accumulator space of a = (y, s, θ). Two edge pixels at different orientations describe the same surface rotated by the same amount with respect to θ. If these two surfaces intersect, points where they intersect will correspond to possible parameters a for the shape. Thus it is theoretically possible to use the two points in image space to reduce the locus in parameter space to a single point. However, the difficulties of finding the intersection points of the two surfaces in parameter space will make this approach unfeasible for most cases.

Composite shapes

If the shape S has a composite structure consisting of subparts S1, S2, .. SN and the reference points for the shapes S, S1, S2, .. SN are y, y1, y2, .. yn, respectively, then for a scaling factor s and orientation θ, the generalized Hough transform Rs(ɸ) is given by . The concern with this transform is that the choice of reference can greatly affect the accuracy. To overcome this, Ballard has suggested smoothing the resultant accumulator with a composite smoothing template. The composite smoothing template H(y) is given as a composite convolution of individual smoothing templates of the sub-shapes. . Then the improved Accumulator is given by As = A*H and the maxima in As corresponds to possible instances of the shape.

Spatial decomposition

Observing that the global Hough transform can be obtained by the summation of local Hough transforms of disjoint sub-region, Heather and Yang [5] proposed a method which involves the recursive subdivision of the image into sub-images, each with their own parameter space, and organized in a quadtree structure. It results in improved efficiency in finding endpoints of line segments and improved robustness and reliability in extracting lines in noisy situations, at a slightly increased cost of memory.

Implementation

The implementation uses the following equations: [6]

Combining the above equations we have:

Constructing the R-table

(0) Convert the sample shape image into an edge image using any edge detecting algorithm like Canny edge detector
(1) Pick a reference point (e.g., (xc, yc))
(2) Draw a line from the reference point to the boundary
(3) Compute ɸ
(4) Store the reference point (xc, yc) as a function of ɸ in R(ɸ) table.

Detection:

(0) Convert the sample shape image into an edge image using any edge detecting algorithm like Canny edge detector.
(1) Initialize the Accumulator table: A[xcmin . . . xcmax][ycmin . . . ycmax]
(2) For each edge point (x, y)
(2.1) Using the gradient angle ɸ, retrieve from the R-table all the (α, r) values indexed under ɸ.
(2.2) For each (α,r), compute the candidate reference points:
xc = x + r cos(α)
yc = y + r sin(α)
(2.3) Increase counters (voting):
++A([[xc]][yc])
(3) Possible locations of the object contour are given by local maxima in A[xc][yc].
If A[xc][yc] > T, then the object contour is located at (xc, yc)

General case:

Suppose the object has undergone some rotation Θ and uniform scaling s:

(x, y) → (x″, y″)
x″ = (x cos(Θ) – y sin(Θ))s
y″ = (x sin(Θ) + y cos(Θ))s
Replacing x by x″ and y by y″:
xc = x – x″ or xc = x - (x cos(Θ) – y sin(Θ))s
yc = y – y″ or yc = y - (x sin(Θ) + y cos(Θ))s
(1) Initialize the Accumulator table: A[xcmin . . . xcmax][ycmin . . . ycmax][qmin . . . qmax][smin . . . smax]
(2) For each edge point (x, y)
(2.1) Using its gradient angle ɸ, retrieve all the (α, r) values from the R-table
(2.2) For each (α, r), compute the candidate reference points:
x = r cos(α)
y = r sin(α)
for(Θ = Θmin; Θ ≤ Θmax; Θ++)
for(s = smin; s ≤ smax; s++)
xc = x - (x cos(Θ) – y sin(Θ))s
yc = y - (x sin(Θ) + y cos(Θ))s
++(A[xc][yc][Θ][s])
(3) Possible locations of the object contour are given by local maxima in A[xc][yc][Θ][s]
If A[xc][yc][Θ][s] > T, then the object contour is located at (xc, yc), has undergone a rotation Θ, and has been scaled by s.

Advantages and disadvantages

Advantages

Disadvantages

Ballard suggested using orientation information of the edge decreasing the cost of the computation. Many efficient GHT techniques have been suggested such as the SC-GHT (Using slope and curvature as local properties). [7] Davis and Yam [8] also suggested an extension of Merlin's work for orientation and scale invariant matching which complement's Ballard's work but does not include Ballard's utilization of edge-slope information and composite structures

See also

Related Research Articles

<span class="mw-page-title-main">2D computer graphics</span> Computer-based generation of digital images

2D computer graphics is the computer-based generation of digital images—mostly from two-dimensional models and by techniques specific to them. It may refer to the branch of computer science that comprises such techniques or to the models themselves.

<span class="mw-page-title-main">Lorentz group</span> Lie group of Lorentz transformations

In physics and mathematics, the Lorentz group is the group of all Lorentz transformations of Minkowski spacetime, the classical and quantum setting for all (non-gravitational) physical phenomena. The Lorentz group is named for the Dutch physicist Hendrik Lorentz.

<span class="mw-page-title-main">Euler angles</span> Description of the orientation of a rigid body

The Euler angles are three angles introduced by Leonhard Euler to describe the orientation of a rigid body with respect to a fixed coordinate system.

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state. The generalized velocities are the time derivatives of the generalized coordinates of the system. The adjective "generalized" distinguishes these parameters from the traditional use of the term "coordinate" to refer to Cartesian coordinates.

<span class="mw-page-title-main">Radon transform</span> Integral transform

In mathematics, the Radon transform is the integral transform which takes a function f defined on the plane to a function Rf defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the line integral of the function over that line. The transform was introduced in 1917 by Johann Radon, who also provided a formula for the inverse transform. Radon further included formulas for the transform in three dimensions, in which the integral is taken over planes. It was later generalized to higher-dimensional Euclidean spaces and more broadly in the context of integral geometry. The complex analogue of the Radon transform is known as the Penrose transform. The Radon transform is widely applicable to tomography, the creation of an image from the projection data associated with cross-sectional scans of an object.

<span class="mw-page-title-main">Vanishing point</span> Artistic concept relating to perspective

A vanishing point is a point on the image plane of a perspective rendering where the two-dimensional perspective projections of mutually parallel lines in three-dimensional space appear to converge. When the set of parallel lines is perpendicular to a picture plane, the construction is known as one-point perspective, and their vanishing point corresponds to the oculus, or "eye point", from which the image should be viewed for correct perspective geometry. Traditional linear drawings use objects with one to three sets of parallels, defining one to three vanishing points.

<span class="mw-page-title-main">Spirograph</span> Geometric drawing device

Spirograph is a geometric drawing device that produces mathematical roulette curves of the variety technically known as hypotrochoids and epitrochoids. The well-known toy version was developed by British engineer Denys Fisher and first sold in 1965.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

<span class="mw-page-title-main">Osculating circle</span> Circle of immediate corresponding curvature of a curve at a point

An osculating circle is a circle that best approximates the curvature of a curve at a specific point. It is tangent to the curve at that point and has the same curvature as the curve at that point. The osculating circle provides a way to understand the local behavior of a curve and is commonly used in differential geometry and calculus.

<span class="mw-page-title-main">Procrustes analysis</span> Statistical shape analysis technique

In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name Procrustes refers to a bandit from Greek mythology who made his victims fit his bed either by stretching their limbs or cutting them off.

<span class="mw-page-title-main">Sine and cosine</span> Fundamental trigonometric functions

In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side that is opposite that angle to the length of the longest side of the triangle, and the cosine is the ratio of the length of the adjacent leg to that of the hypotenuse. For an angle , the sine and cosine functions are denoted simply as and .

<span class="mw-page-title-main">Kepler orbit</span> Celestial orbit whose trajectory is a conic section in the orbital plane

In celestial mechanics, a Kepler orbit is the motion of one body relative to another, as an ellipse, parabola, or hyperbola, which forms a two-dimensional orbital plane in three-dimensional space. A Kepler orbit can also form a straight line. It considers only the point-like gravitational attraction of two bodies, neglecting perturbations due to gravitational interactions with other objects, atmospheric drag, solar radiation pressure, a non-spherical central body, and so on. It is thus said to be a solution of a special case of the two-body problem, known as the Kepler problem. As a theory in classical mechanics, it also does not take into account the effects of general relativity. Keplerian orbits can be parametrized into six orbital elements in various ways.

Hough transforms are techniques for object detection, a critical step in many implementations of computer vision, or data mining from images. Specifically, the Randomized Hough transform is a probabilistic variant to the classical Hough transform, and is commonly used to detect curves The basic idea of Hough transform (HT) is to implement a voting procedure for all potential curves in the image, and at the termination of the algorithm, curves that do exist in the image will have relatively high voting scores. Randomized Hough transform (RHT) is different from HT in that it tries to avoid conducting the computationally expensive voting process for every nonzero pixel in the image by taking advantage of the geometric properties of analytical curves, and thus improve the time efficiency and reduce the storage requirement of the original algorithm.

EigenMoments is a set of orthogonal, noise robust, invariant to rotation, scaling and translation and distribution sensitive moments. Their application can be found in signal processing and computer vision as descriptors of the signal or image. The descriptors can later be used for classification purposes.

In physics and engineering, Davenport chained rotations are three chained intrinsic rotations about body-fixed specific axes. Euler rotations and Tait–Bryan rotations are particular cases of the Davenport general rotation decomposition. The angles of rotation are called Davenport angles because the general problem of decomposing a rotation in a sequence of three was studied first by Paul B. Davenport.

The circle Hough Transform (CHT) is a basic feature extraction technique used in digital image processing for detecting circles in imperfect images. The circle candidates are produced by “voting” in the Hough parameter space and then selecting local maxima in an accumulator matrix.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

In image analysis, the generalized structure tensor (GST) is an extension of the Cartesian structure tensor to curvilinear coordinates. It is mainly used to detect and to represent the "direction" parameters of curves, just as the Cartesian structure tensor detects and represents the direction in Cartesian coordinates. Curve families generated by pairs of locally orthogonal functions have been the best studied.

In image processing, line detection is an algorithm that takes a collection of n edge points and finds all the lines on which these edge points lie. The most popular line detectors are the Hough transform and convolution-based techniques.

References

  1. D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981
  2. Jaulin, L.; Bazeille, S. (2013). Image Shape Extraction using Interval Methods (PDF). In Proceedings of Sysid 2009, Saint-Malo, France.
  3. Merlin, P. M.; Farber, D. J. (January 1975). "A Parallel Mechanism for Detecting Curves in Pictures". IEEE Transactions on Computers. C-24 (1): 96–98. doi:10.1109/t-c.1975.224087. ISSN   0018-9340. S2CID   27723442.
  4. L. Davis, "Hierarchical Generalized Hough Transforms and Line Segment Based Generalized Hough Transforms", University of Texas Computer Sciences, Nov 1980
  5. J.A. Heather, Xue Dong Yang, "Spatial Decomposition of the Hough Transform", The 2nd Canadian Conference on Computer and Robot Vision, 2005.
  6. Ballard and Brown, section 4.3.4, Sonka et al., section 5.2.6
  7. A. A. Kassim, T. Tan, K. H. Tan, "A comparative study of efficient generalised Hough transform techniques", Image and Vision Computing, Volume 17, Issue 10, Pages 737-748, August 1999
  8. L. Davis and S. Yam, "A generalized hough-like transformation for shape recognition". University of Texas Computer Sciences, TR-134, Feb 1980.