Spatial verification

Last updated

Spatial verification is a technique in which similar locations can be identified in an automated way through a sequence of images. The general method involves identifying a correlation between certain points among sets images, using techniques similar to those used for image registration.

Contents

The main problem is that outliers (that does not fit or does not match the selected model) affect adjustment called least squares (numerical analysis technique framed in mathematical optimization, which, given an set of ordered pairs: independent variable, dependent variable, and a family of functions, try to find the continuous function).

Advantages

Disadvantages

Methods

The most widely used for spatial verification and avoid errors caused by these outliers methods are:

Fitted Line with RANSAC Fitted line.svg
Fitted Line with RANSAC

Random sample consensus (RANSAC)

Seeks to avoid the impact of outliers, that not fit with the model, so only considers inline which match the model in question. If an outlier is chosen to calculate the current setting, then the resulting line will have little support from the rest of the points. The algorithm that is performed is a loop that performs the following steps:

  1. Of the entire input data set, takes a subset randomly to estimate the model.
  2. Compute model subset. The model is estimated with standard linear algorithms.
  3. Find the matching values of transformation.
  4. If the error is minimal model, this is accepted, and if the number of correspondences is long enough, the subset of points involved consensus assembly is referred. And it becomes to compute the estimated model in all correspondences.

The goal is to keep the model with the highest number of matches and the main problem is the number of times you have to repeat the process to obtain the best estimate of the model. RANSAC set in advance the number of iterations of the algorithm.

To specify scenes or objects, is commonly used affine transformations to perform the spatial verification.

Generalized Hough transform (GHT)

This is a technique for detecting shapes in digital images that solves the veracity of space by clusters of points belonging to the model through a voting procedure on a set of parametric figures.

Not all possible combinations comprovar characteristics by adjusting a model for every possible subset, so that the voting technique, in which a vote is stored for each possible line in which each point is used. Then observe what were the lines with the most votes and those are selected.

If we use the local characteristics of scale, rotation and translation invariant, each feature coincidence gives a hypothesis alignment for scaling, translation and orientation of the model in the picture.

One hypothesis generated by a single match can be unreliable, so for each match (match), a vote is done to get a stronger hypothesis in the Hough space. So we have two major phases:

The main disadvantages are:

Comparison

Differences between RANSAC and GHT
GHTRANSAC
CorrespondenceSingle correspondence, voting for all consistent parametersMinimum subset of correspondences for estimate the model (count liners)
RepresentationRepresents uncertainty in the model spaceRepresent uncertainty in the image space
ComplexityLineal complexity in the number of correspondences and the number of voting cellsMust find all points of data for check the inliers in every iteration.
AdvantagesCan manipulate bigger outliersScale better spaces of more dimensions

Related Research Articles

<span class="mw-page-title-main">Image registration</span> Mapping of data into a single system

Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, military automatic target recognition, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements.

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

<span class="mw-page-title-main">Solid modeling</span> Set of principles for modeling solid geometry

Solid modeling is a consistent set of principles for mathematical and computer modeling of three-dimensional shapes (solids). Solid modeling is distinguished within the broader related areas of geometric modeling and computer graphics, such as 3D modeling, by its emphasis on physical fidelity. Together, the principles of geometric and solid modeling form the foundation of 3D-computer-aided design and in general support the creation, exchange, visualization, animation, interrogation, and annotation of digital models of physical objects.

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at SRI International in 1981. They used RANSAC to solve the Location Determination Problem (LDP), where the goal is to determine the points in the space that project onto an image into a set of landmarks with known locations.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

The condensation algorithm is a computer vision algorithm. The principal application is to detect and track the contour of objects moving in a cluttered environment. Object tracking is one of the more basic and difficult aspects of computer vision and is generally a prerequisite to object recognition. Being able to identify which pixels in an image make up the contour of an object is a non-trivial problem. Condensation is a probabilistic algorithm that attempts to solve this problem.

<span class="mw-page-title-main">Iterative closest point</span> Algorithm

Iterative closest point (ICP) is an algorithm employed to minimize the difference between two clouds of points. ICP is often used to reconstruct 2D or 3D surfaces from different scans, to localize robots and achieve optimal path planning, to co-register bone models, etc.

<span class="mw-page-title-main">Image stitching</span> Combining multiple photographic images with overlapping fields of view

Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally.

In the fields of computing and computer vision, pose represents the position and orientation of an object, usually in three dimensions. Poses are often stored internally as transformation matrices. The term “pose” is largely synonymous with the term “transform”, but a transform may often include scale, whereas pose does not.

In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

The generalized Hough transform (GHT), introduced by Dana H. Ballard in 1981, is the modification of the Hough transform using the principle of template matching. The Hough transform was initially developed to detect analytically defined shapes. In these cases, we have knowledge of the shape and aim to find out its location and orientation in the image. This modification enables the Hough transform to be used to detect an arbitrary object described with its model.

The Kadir–Brady saliency detector extracts features of objects in images that are distinct and representative. It was invented by Timor Kadir and J. Michael Brady in 2001 and an affine invariant version was introduced by Kadir and Brady in 2004 and a robust version was designed by Shao et al. in 2007.

Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.

In computer vision, 3D object recognition involves recognizing and determining 3D information, such as the pose, volume, or shape, of user-chosen 3D objects in a photograph or range scan. Typically, an example of the object to be recognized is presented to a vision system in a controlled environment, and then for an arbitrary input such as a video stream, the system locates the previously presented object. This can be done either off-line, or in real-time. The algorithms for solving this problem are specialized for locating a single pre-identified object, and can be contrasted with algorithms which operate on general classes of objects, such as face recognition systems or 3D generic object recognition. Due to the low cost and ease of acquiring photographs, a significant amount of research has been devoted to 3D object recognition in photographs.

In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

Shape context is a feature descriptor used in object recognition. Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000.

<span class="mw-page-title-main">3D reconstruction</span> Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

<span class="mw-page-title-main">3D pose estimation</span> Process of determining spatial characteristics of objects

3D pose estimation is a process of predicting the transformation of an object from a user-defined reference pose, given an image or a 3D scan. It arises in computer vision or robotics where the pose or transformation of an object can be used for alignment of a computer-aided design models, identification, grasping, or manipulation of the object.

<span class="mw-page-title-main">Point-set registration</span> Process of finding a spatial transformation that aligns two point clouds

In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model, and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

References