Randomized Hough transform

Last updated October 30, 2022

Hough transforms are techniques for object detection, a critical step in many implementations of computer vision, or data mining from images. Specifically, the Randomized Hough transform is a probabilistic variant to the classical Hough transform, and is commonly used to detect curves (straight line, circle, ellipse, etc.)^[1] The basic idea of Hough transform (HT) is to implement a voting procedure for all potential curves in the image, and at the termination of the algorithm, curves that do exist in the image will have relatively high voting scores. Randomized Hough transform (RHT) is different from HT in that it tries to avoid conducting the computationally expensive voting process for every nonzero pixel in the image by taking advantage of the geometric properties of analytical curves, and thus improve the time efficiency and reduce the storage requirement of the original algorithm.

Motivation

Although Hough transform (HT) has been widely used in curve detection, it has two major drawbacks:^[2] First, for each nonzero pixel in the image, the parameters for the existing curve and redundant ones are both accumulated during the voting procedure. Second, the accumulator array (or Hough space) is predefined in a heuristic way. The more accuracy needed, the higher parameter resolution should be defined. These two needs usually result in a large storage requirement and low speed for real applications. Therefore, RHT was brought up to tackle this problem.

Implementation

In comparison with HT, RHT takes advantage of the fact that some analytical curves can be fully determined by a certain number of points on the curve. For example, a straight line can be determined by two points, and an ellipse (or a circle) can be determined by three points. The case of ellipse detection can be used to illustrate the basic idea of RHT. The whole process generally consists of three steps:

Fit ellipses with randomly selected points.
Update the accumulator array and corresponding scores.
Output the ellipses with scores higher than some predefined threshold.

Ellipse fitting

One general equation for defining ellipses is: $a(x-p)^{2}+2b(x-p)(y-q)+c(y-q)^{2}=1$

with restriction: $ac-b^{2}>0$

However, an ellipse can be fully determined if one knows three points on it and the tangents in these points.

RHT starts by randomly selecting three points on the ellipse. Let them be $X_{1}$ , $X_{2}$ and $X_{3}$ . The first step is to find the tangents of these three points. They can be found by fitting a straight line using least squares technique for a small window of neighboring pixels.

The next step is to find the intersection points of the tangent lines. This can be easily done by solving the line equations found in the previous step. Then let the intersection points be $T_{12}$ and T $T_{23}$ , the midpoints of line segments $X_{1}X_{2}$ and $X_{2}X_{3}$ be $M_{12}$ and $M_{23}$ . Then the center of the ellipse will lie in the intersection of $T_{12}M_{12}$ and $T_{23}M_{23}$ . Again, the coordinates of the intersected point can be determined by solving line equations and the detailed process is skipped here for conciseness.

Let the coordinates of ellipse center found in previous step be $(x_{0},y_{0})$ . Then the center can be translated to the origin with $x'=x-x_{0}$ and $y'=y-y_{0}$ so that the ellipse equation can be simplified to:

$ax'^{2}+2bx'y'+cy'^{2}=1$

Now we can solve for the rest of ellipse parameters: $a$ , $b$ and $c$ by substituting the coordinates of $X_{1}$ , $X_{2}$ and $X_{3}$ into the equation above.

Accumulating

With the ellipse parameters determined from previous stage, the accumulator array can be updated correspondingly. Different from classical Hough transform, RHT does not keep "grid of buckets" as the accumulator array. Rather, it first calculates the similarities between the newly detected ellipse and the ones already stored in accumulator array. Different metrics can be used to calculate the similarity. As long as the similarity exceeds some predefined threshold, replace the one in the accumulator with the average of both ellipses and add 1 to its score. Otherwise, initialize this ellipse to an empty position in the accumulator and assign a score of 1.

Termination

Once the score of one candidate ellipse exceeds the threshold, it is determined as existing in the image (in other words, this ellipse is detected), and should be removed from the image and accumulator array so that the algorithm can detect other potential ellipses faster. The algorithm terminates when the number of iterations reaches a maximum limit or all the ellipses have been detected.

Pseudo code for RHT:^[3]

while (we find ellipses AND not reached the maximum epoch) {     for (a fixed number of iterations) {         Find a potential ellipse.         if (the ellipse is similar to an ellipse in the accumulator) then             Replace the one in the accumulator with the average of two ellipses and add 1 to the score;         else             Insert the ellipse into an empty position in the accumulator with a score of 1;     }     Select the ellipse with the best score and save it in a best ellipse table;     Eliminate the pixels of the best ellipse from the image;     Empty the accumulator; }

Related Research Articles

Bresenham's line algorithm is a line drawing algorithm that determines the points of an n-dimensional raster that should be selected in order to form a close approximation to a straight line between two points. It is commonly used to draw line primitives in a bitmap image, as it uses only integer addition, subtraction and bit shifting, all of which are very cheap operations in commonly used computer instruction sets such as x86_64. It is an incremental error algorithm, and one of the earliest algorithms developed in the field of computer graphics. An extension to the original algorithm may be used for drawing circles.

Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection. Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction.

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

A vanishing point is a point on the image plane of a perspective drawing where the two-dimensional perspective projections of mutually parallel lines in three-dimensional space appear to converge. When the set of parallel lines is perpendicular to a picture plane, the construction is known as one-point perspective, and their vanishing point corresponds to the oculus, or "eye point", from which the image should be viewed for correct perspective geometry. Traditional linear drawings use objects with one to three sets of parallels, defining one to three vanishing points.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

<span class="mw-page-title-main">Image stitching</span> Combining multiple photographic images with overlapping fields of view

Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally.

In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

<span class="mw-page-title-main">Corner detection</span>

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

In computer graphics, the midpoint circle algorithm is an algorithm used to determine the points needed for rasterizing a circle. Bresenham's circle algorithm is derived from the midpoint circle algorithm. The algorithm can be generalized to conic sections.

The generalized Hough transform (GHT), introduced by Dana H. Ballard in 1981, is the modification of the Hough transform using the principle of template matching. The Hough transform was initially developed to detect analytically defined shapes. In these cases, we have knowledge of the shape and aim to find out its location and orientation in the image. This modification enables the Hough transform to be used to detect an arbitrary object described with its model.

The Kadir–Brady saliency detector extracts features of objects in images that are distinct and representative. It was invented by Timor Kadir and J. Michael Brady in 2001 and an affine invariant version was introduced by Kadir and Brady in 2004 and a robust version was designed by Shao et al. in 2007.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

<span class="mw-page-title-main">Conic section</span> Curve obtained by intersecting a cone and a plane

In mathematics, a conic section, quadratic curve or simply conic is a curve obtained as the intersection of the surface of a cone with a plane. The three types of conic section are the hyperbola, the parabola, and the ellipse; the circle is a special case of the ellipse, though historically it was sometimes called a fourth type. The ancient Greek mathematicians studied conic sections, culminating around 200 BC with Apollonius of Perga's systematic work on their properties.

Chessboards arise frequently in computer vision theory and practice because their highly structured geometry is well-suited for algorithmic detection and processing. The appearance of chessboards in computer vision can be divided into two main areas: camera calibration and feature extraction. This article provides a unified discussion of the role that chessboards play in the canonical methods from these two areas, including references to the seminal literature, examples, and pointers to software implementations.

The circle Hough Transform (CHT) is a basic feature extraction technique used in digital image processing for detecting circles in imperfect images. The circle candidates are produced by “voting” in the Hough parameter space and then selecting local maxima in an accumulator matrix.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

In image analysis, the generalized structure tensor (GST) is an extension of the Cartesian structure tensor to curvilinear coordinates. It is mainly used to detect and to represent the "direction" parameters of curves, just as the Cartesian structure tensor detects and represents the direction in Cartesian coordinates. Curve families generated by pairs of locally orthogonal functions have been the best studied.

In image processing, line detection is an algorithm that takes a collection of n edge points and finds all the lines on which these edge points lie. The most popular line detectors are the Hough transform and convolution-based techniques.

References

↑ D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981
↑ L. Xu, E. Oja, and P. Kultanan, "A new curve detection method: Randomized Hough transform (RHT)", Pattern Recog. Lett. 11, 1990, 331-338.
↑ S. Inverso, “Ellipse Detection Using Randomized Hough Transform”, www.saminverso.com/res/vision/EllipseDetectionOld.pdf, May 20, 2002

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981

[2] L. Xu, E. Oja, and P. Kultanan, "A new curve detection method: Randomized Hough transform (RHT)", Pattern Recog. Lett. 11, 1990, 331-338.

[3] S. Inverso, “Ellipse Detection Using Randomized Hough Transform”, www.saminverso.com/res/vision/EllipseDetectionOld.pdf, May 20, 2002

[1]

[2]

[3]