Geometric hashing

Last updated December 04, 2023

In computer science, geometric hashing is a method for efficiently finding two-dimensional objects represented by discrete points that have undergone an affine transformation, though extensions exist to other object representations and transformations. In an off-line step, the objects are encoded by treating each pair of points as a geometric basis. The remaining points can be represented in an invariant fashion with respect to this basis using two parameters. For each point, its quantized transformed coordinates are stored in the hash table as a key, and indices of the basis points as a value. Then a new pair of basis points is selected, and the process is repeated. In the on-line (recognition) step, randomly selected pairs of data points are considered as candidate bases. For each candidate basis, the remaining data points are encoded according to the basis and possible correspondences from the object are found in the previously constructed table. The candidate basis is accepted if a sufficiently large number of the data points index a consistent object basis.

Geometric hashing in computer vision

Geometric hashing is a method used for object recognition. Let’s say that we want to check if a model image can be seen in an input image. This can be accomplished with geometric hashing. The method could be used to recognize one of the multiple objects in a base, in this case the hash table should store not only the pose information but also the index of object model in the base.

Example

For simplicity, this example will not use too many point features and assume that their descriptors are given by their coordinates only (in practice local descriptors such as SIFT could be used for indexing).

Training Phase

Points of the object in the image coordinate system, and axes for the coordinate system for the basis (P2,P4) GeometricHasingExample.png — Points of the object in the image coordinate system, and axes for the coordinate system for the basis (P2,P4)

Find the model's feature points. Assume that 5 feature points are found in the model image with the coordinates $(12,17);$ $(45,13);$ $(40,46);$ $(20,35);$ $(35,25)$ , see the picture.
Introduce a basis to describe the locations of the feature points. For 2D space and similarity transformation the basis is defined by a pair of points. The point of origin is placed in the middle of the segment connecting the two points (P2, P4 in our example), the $x'$ axis is directed towards one of them, the $y'$ is orthogonal and goes through the origin. The scale is selected such that absolute value of $x'$ for both basis points is 1.
Describe feature locations with respect to that basis, i.e. compute the projections to the new coordinate axes. The coordinates should be discretised to make recognition robust to noise, we take the bin size 0.25. We thus get the coordinates $(-0.75,-1.25);$ $(1.00,0.00);$ $(-0.50,1.25);$ $(-1.00,0.00);$ $(0.00,0.25)$
Store the basis in a hash table indexed by the features (only transformed coordinates in this case). If there were more objects to match with, we should also store the object number along with the basis pair.
Repeat the process for a different basis pair (Step 2). It is needed to handle occlusions. Ideally, all the non-colinear pairs should be enumerated. We provide the hash table after two iterations, the pair (P1, P3) is selected for the second one.

Hash Table:

Vector ( $x'$ , $y'$ )	basis
$(-0.75,-1.25);$	(P2,P4)
$(1.00,0.00);$	(P2,P4)
$(-0.50,1.25);$	(P2,P4)
$(-1.00,0.00);$	(P2,P4)
$(0.00,0.25)$	(P2,P4)
$(1.00,0.00);$	(P1,P3)
$(0.00,1.25);$	(P1,P3)
$(-1.00,0.00);$	(P1,P3)
$(0.00,-0.25);$	(P1,P3)
$(0.00,0.50)$	(P1,P3)

Most hash tables cannot have identical keys mapped to different values. So in real life one won’t encode basis keys (1.0, 0.0) and (-1.0, 0.0) in a hash table.

Recognition Phase

Find interesting feature points in the input image.
Choose an arbitrary basis. If there isn't a suitable arbitrary basis, then it is likely that the input image does not contain the target object.
Describe coordinates of the feature points in the new basis. Quantize obtained coordinates as it was done before.
Compare all the transformed point features in the input image with the hash table. If the point features are identical or similar, then increase the count for the corresponding basis (and the type of object, if any).
For each basis such that the count exceeds a certain threshold, verify the hypothesis that it corresponds to an image basis chosen in Step 2. Transfer the image coordinate system to the model one (for the supposed object) and try to match them. If successful, the object is found. Otherwise, go back to Step 2.

Finding mirrored pattern

It seems that this method is only capable of handling scaling, translation, and rotation. However, the input image may contain the object in mirror transform. Therefore, geometric hashing should be able to find the object, too. There are two ways to detect mirrored objects.

For the vector graph, make the left side positive, and the right side negative. Multiplying the x position by -1 will give the same result.
Use 3 points for the basis. This allows detecting mirror images (or objects). Actually, using 3 points for the basis is another approach for geometric hashing.

Geometric hashing in higher-dimensions

Similar to the example above, hashing applies to higher-dimensional data. For three-dimensional data points, three points are also needed for the basis. The first two points define the x-axis, and the third point defines the y-axis (with the first point). The z-axis is perpendicular to the created axis using the right-hand rule. Notice that the order of the points affects the resulting basis

Related Research Articles

In mathematics, analytic geometry, also known as coordinate geometry or Cartesian geometry, is the study of geometry using a coordinate system. This contrasts with synthetic geometry.

<span class="mw-page-title-main">Cartesian coordinate system</span> Most common coordinate system (geometry)

In geometry, a Cartesian coordinate system in a plane is a coordinate system that specifies each point uniquely by a pair of real numbers called coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, called coordinate lines, coordinate axes or just axes of the system. The point where they meet is called the origin and has $(0, 0)$ as coordinates.

<span class="mw-page-title-main">2D computer graphics</span> Computer-based generation of digital images

2D computer graphics is the computer-based generation of digital images—mostly from two-dimensional models and by techniques specific to them. It may refer to the branch of computer science that comprises such techniques or to the models themselves.

In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space. The order of the coordinates is significant, and they are sometimes identified by their position in an ordered tuple and sometimes by a letter, as in "the x-coordinate". The coordinates are taken to be real numbers in elementary mathematics, but may be complex numbers or elements of a more abstract system such as a commutative ring. The use of a coordinate system allows problems in geometry to be translated into problems about numbers and vice versa; this is the basis of analytic geometry.

In Euclidean geometry, a translation is a geometric transformation that moves every point of a figure, shape or space by the same distance in a given direction. A translation can also be interpreted as the addition of a constant vector to every point, or as shifting the origin of the coordinate system. In a Euclidean space, any translation is an isometry.

A 3D projection is a design technique used to display a three-dimensional (3D) object on a two-dimensional (2D) surface. These projections rely on visual perspective and aspect analysis to project a complex object for viewing capability on a simpler plane.

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

In plane geometry, a shear mapping is an affine transformation that displaces each point in a fixed direction by an amount proportional to its signed distance from a given line parallel to that direction. This type of mapping is also called shear transformation, transvection, or just shearing.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

The computer graphics pipeline, also known as the rendering pipeline or graphics pipeline, is a framework within computer graphics that outlines the necessary procedures for transforming a three-dimensional (3D) scene into a two-dimensional (2D) representation on a screen. Once a 3D model is generated, whether it's for a video game or any other form of 3D computer animation, the graphics pipeline converts the model into a visually perceivable format on the computer display. Due to the dependence on specific software, hardware configurations, and desired display attributes, a universally applicable graphics pipeline does not exist. Nevertheless, graphics application programming interfaces (APIs), such as Direct3D and OpenGL, were developed to standardize common procedures and oversee the graphics pipeline of a given hardware accelerator. These APIs provide an abstraction layer over the underlying hardware, relieving programmers from the need to write code explicitly targeting various graphics hardware accelerators like AMD, Intel, Nvidia, and others.

<span class="mw-page-title-main">Image stitching</span> Combining multiple photographic images with overlapping fields of view

Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally.

In projective geometry, a homography is an isomorphism of projective spaces, induced by an isomorphism of the vector spaces from which the projective spaces derive. It is a bijection that maps lines to lines, and thus a collineation. In general, some collineations are not homographies, but the fundamental theorem of projective geometry asserts that is not so in the case of real projective spaces of dimension at least two. Synonyms include projectivity, projective transformation, and projective collineation.

<span class="mw-page-title-main">UV mapping</span> Process of projecting a 3D models surface to a 2D image for texture mapping

UV mapping is the 3D modeling process of projecting a 3D model's surface to a 2D image for texture mapping. The letters "U" and "V" denote the axes of the 2D texture because "X", "Y", and "Z" are already used to denote the axes of the 3D object in model space, while "W" is used in calculating quaternion rotations, a common operation in computer graphics.

Function Representation is used in solid modeling, volume modeling and computer graphics. FRep was introduced in "Function representation in geometric modeling: concepts, implementation and applications" as a uniform representation of multidimensional geometric objects (shapes). An object as a point set in multidimensional space is defined by a single continuous real-valued function $of point coordinates which is evaluated at the given point by a procedure traversing a tree structure with primitives in the leaves and operations in the nodes of the tree. The points with belong to the object, and the points with are outside of the object. The point set with is called an isosurface.$

The closest pair of points problem or closest pair problem is a problem of computational geometry: given $points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms.$

Image rectification is a transformation process used to project images onto a common image plane. This process has several degrees of freedom and there are many strategies for transforming images to the common plane. Image rectification is used in computer stereo vision to simplify the problem of finding matching points between images, and in geographic information systems to merge images taken from multiple perspectives into a common map coordinate system.

<span class="mw-page-title-main">Pinhole camera model</span> Model of 3D points projected onto planar image via a lens-less aperture

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to focus light. The model does not include, for example, geometric distortions or blurring of unfocused objects caused by lenses and finite sized apertures. It also does not take into account that most practical cameras have only discrete image coordinates. This means that the pinhole camera model can only be used as a first order approximation of the mapping from a 3D scene to a 2D image. Its validity depends on the quality of the camera and, in general, decreases from the center of the image to the edges as lens distortion effects increase.

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

Hough transforms are techniques for object detection, a critical step in many implementations of computer vision, or data mining from images. Specifically, the Randomized Hough transform is a probabilistic variant to the classical Hough transform, and is commonly used to detect curves The basic idea of Hough transform (HT) is to implement a voting procedure for all potential curves in the image, and at the termination of the algorithm, curves that do exist in the image will have relatively high voting scores. Randomized Hough transform (RHT) is different from HT in that it tries to avoid conducting the computationally expensive voting process for every nonzero pixel in the image by taking advantage of the geometric properties of analytical curves, and thus improve the time efficiency and reduce the storage requirement of the original algorithm.

Geometric feature learning is a technique combining machine learning and computer vision to solve visual tasks. The main goal of this method is to find a set of representative features of geometric form to represent an object by collecting geometric features from images and learning them using efficient machine learning methods. Humans solve visual tasks and can give fast response to the environment by extracting perceptual information from what they see. Researchers simulate humans' ability of recognizing objects to solve computer vision problems. For example, M. Mata et al.(2002) applied feature learning techniques to the mobile robot navigation tasks in order to avoid obstacles. They used genetic algorithms for learning features and recognizing objects (figures). Geometric feature learning methods can not only solve recognition problems but also predict subsequent actions by analyzing a set of sequential input sensory images, usually some extracting features of images. Through learning, some hypothesis of the next action are given and according to the probability of each hypothesis give a most probable action. This technique is widely used in the area of artificial intelligence.

References

↑ A.S. Mian, M. Bennamoun, and R. Owens, Three-dimensional model-based object recognition and segmentation in cluttered scenes., IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, Oct. 2006, pp. 1584-601.
↑ Moll, Mark; Bryant, Drew H.; Kavraki, Lydia E. (2010-11-11). "The LabelHash algorithm for substructure matching". BMC Bioinformatics. 11: 555. doi: 10.1186/1471-2105-11-555 . ISSN 1471-2105. PMC 2996407 . PMID 21070651.
↑ Nussinov, R.; Wolfson, H. J. (1991-12-01). "Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques". Proceedings of the National Academy of Sciences of the United States of America. 88 (23): 10495–10499. doi: 10.1073/pnas.88.23.10495 . ISSN 0027-8424. PMC 52955 . PMID 1961713.

Wolfson, H.J. & Rigoutsos, I (1997). Geometric Hashing: An Overview. IEEE Computational Science and Engineering, 4(4), 10-21.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Mian2006-1] A.S. Mian, M. Bennamoun, and R. Owens, Three-dimensional model-based object recognition and segmentation in cluttered scenes., IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, Oct. 2006, pp. 1584-601.

[2] Moll, Mark; Bryant, Drew H.; Kavraki, Lydia E. (2010-11-11). "The LabelHash algorithm for substructure matching". BMC Bioinformatics. 11: 555. doi: 10.1186/1471-2105-11-555 . ISSN 1471-2105. PMC 2996407 . PMID 21070651.

[3] Nussinov, R.; Wolfson, H. J. (1991-12-01). "Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques". Proceedings of the National Academy of Sciences of the United States of America. 88 (23): 10495–10499. doi: 10.1073/pnas.88.23.10495 . ISSN 0027-8424. PMC 52955 . PMID 1961713.

[1]

[2]

[3]