Random sample consensus

Last updated

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence[ clarify ] on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method. [1] It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at SRI International in 1981. They used RANSAC to solve the location determination problem (LDP), where the goal is to determine the points in the space that project onto an image into a set of landmarks with known locations.

Contents

RANSAC uses repeated random sub-sampling. [2] A basic assumption is that the data consists of "inliers", i.e., data whose distribution can be explained by some set of model parameters, though may be subject to noise, and "outliers", which are data that do not fit the model. The outliers can come, for example, from extreme values of the noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC also assumes that, given a (usually small) set of inliers, there exists a procedure that can estimate the parameters of a model optimally explaining or fitting this data.

Example

A simple example is fitting a line in two dimensions to a set of observations. Assuming that this set contains both inliers, i.e., points which approximately can be fitted to a line, and outliers, points which cannot be fitted to this line, a simple least squares method for line fitting will generally produce a line with a bad fit to the data including inliers and outliers. The reason is that it is optimally fitted to all points, including the outliers. RANSAC, on the other hand, attempts to exclude the outliers and find a linear model that only uses the inliers in its calculation. This is done by fitting linear models to several random samplings of the data and returning the model that has the best fit to a subset of the data. Since the inliers tend to be more linearly related than a random mixture of inliers and outliers, a random subset that consists entirely of inliers will have the best model fit. In practice, there is no guarantee that a subset of inliers will be randomly sampled, and the probability of the algorithm succeeding depends on the proportion of inliers in the data as well as the choice of several algorithm parameters.

Overview

The RANSAC algorithm is a learning technique to estimate parameters of a model by random sampling of observed data. Given a dataset whose data elements contain both inliers and outliers, RANSAC uses the voting scheme to find the optimal fitting result. Data elements in the dataset are used to vote for one or multiple models. The implementation of this voting scheme is based on two assumptions: that the noisy features will not vote consistently for any single model (few outliers) and there are enough features to agree on a good model (few missing data). The RANSAC algorithm is essentially composed of two steps that are iteratively repeated:

  1. A sample subset containing minimal number of data items is randomly selected from the input dataset. A fitting model with model parameters is computed using only the elements of this sample subset. The cardinality of the sample subset (e.g., the amount of data in this subset) is sufficient to determine the model parameters.
  2. The algorithm checks which elements of the entire dataset are consistent with the model instantiated by the estimated model parameters obtained from the first step. A data element will be considered as an outlier if it does not fit the model within some error threshold defining the maximum data deviation of inliers (data elements beyond this deviation are outliers).

The set of inliers obtained for the fitting model is called the consensus set. The RANSAC algorithm will iteratively repeat the above two steps until the obtained consensus set in certain iteration has enough inliers.

The input to the RANSAC algorithm is a set of observed data values, a model to fit to the observations, and some confidence parameters defining outliers. In more details than the aforementioned RANSAC algorithm overview, RANSAC achieves its goal by repeating the following steps:

  1. Select a random subset of the original data. Call this subset the hypothetical inliers.
  2. A model is fitted to the set of hypothetical inliers.
  3. All data are then tested against the fitted model. All the data points (of the original data) that fit the estimated model well, according to some model-specific loss function, are called the consensus set (i.e., the set of inliers for the model).
  4. The estimated model is reasonably good if sufficiently many data points have been classified as a part of the consensus set.
  5. The model may be improved by re-estimating it by using all the members of the consensus set. The fitting quality as a measure of how well the model fits to the consensus set will be used to sharpen the model fitting as iterations goes on (e.g., by setting this measure as the fitting quality criteria at the next iteration).

To converge to a sufficiently good model parameter set, this procedure is repeated a fixed number of times, each time producing either the rejection of a model because too few points are a part of the consensus set, or a refined model with a consensus set size larger than the previous consensus set.

RANSAC: inliers and outliers. The linear fitting to data points in this example is with 7 inliers (data points fitted well with the model under some criteria). It is not a good fitting, since there is a linear line where most data points are distributed near it (i.e., more inliers). RANSAC Inliers and Outliers.png
RANSAC: inliers and outliers. The linear fitting to data points in this example is with 7 inliers (data points fitted well with the model under some criteria). It is not a good fitting, since there is a linear line where most data points are distributed near it (i.e., more inliers).

Pseudocode

The generic RANSAC algorithm works as the following pseudocode:

Given:     data – A set of observations.     model – A model to explain the observed data points.     n – The minimum number of data points required to estimate the model parameters.     k – The maximum number of iterations allowed in the algorithm.     t – A threshold value to determine data points that are fit well by the model (inlier).     d – The number of close data points (inliers) required to assert that the model fits well to the data.  Return:     bestFit – The model parameters which may best fit the data (or null if no good model is found).   iterations = 0 bestFit = null bestErr = something really large // This parameter is used to sharpen the model parameters to the best data fitting as iterations go on.  whileiterations < kdo     maybeInliers := n randomly selected values from data     maybeModel := model parameters fitted to maybeInliers     confirmedInliers := empty set     for every point in data doif point fits maybeModel with an error smaller than t then              add point to confirmedInliers         end ifend forif the number of elements in confirmedInliers is > d then         // This implies that we may have found a good model.         // Now test how good it is.         betterModel := model parameters fitted to all the points in confirmedInliers         thisErr := a measure of how well betterModel fits these points         if thisErr < bestErr then             bestFit := betterModel             bestErr := thisErr         end ifend if     increment iterations end whilereturn bestFit

Example code

A Python implementation mirroring the pseudocode. This also defines a LinearRegressor based on least squares, applies RANSAC to a 2D regression problem, and visualizes the outcome:

fromcopyimportcopyimportnumpyasnpfromnumpy.randomimportdefault_rngrng=default_rng()classRANSAC:def__init__(self,n=10,k=100,t=0.05,d=10,model=None,loss=None,metric=None):self.n=n# `n`: Minimum number of data points to estimate parametersself.k=k# `k`: Maximum iterations allowedself.t=t# `t`: Threshold value to determine if points are fit wellself.d=d# `d`: Number of close data points required to assert model fits wellself.model=model# `model`: class implementing `fit` and `predict`self.loss=loss# `loss`: function of `y_true` and `y_pred` that returns a vectorself.metric=metric# `metric`: function of `y_true` and `y_pred` and returns a floatself.best_fit=Noneself.best_error=np.infdeffit(self,X,y):for_inrange(self.k):ids=rng.permutation(X.shape[0])maybe_inliers=ids[:self.n]maybe_model=copy(self.model).fit(X[maybe_inliers],y[maybe_inliers])thresholded=(self.loss(y[ids][self.n:],maybe_model.predict(X[ids][self.n:]))<self.t)inlier_ids=ids[self.n:][np.flatnonzero(thresholded).flatten()]ifinlier_ids.size>self.d:inlier_points=np.hstack([maybe_inliers,inlier_ids])better_model=copy(self.model).fit(X[inlier_points],y[inlier_points])this_error=self.metric(y[inlier_points],better_model.predict(X[inlier_points]))ifthis_error<self.best_error:self.best_error=this_errorself.best_fit=better_modelreturnselfdefpredict(self,X):returnself.best_fit.predict(X)defsquare_error_loss(y_true,y_pred):return(y_true-y_pred)**2defmean_square_error(y_true,y_pred):returnnp.sum(square_error_loss(y_true,y_pred))/y_true.shape[0]classLinearRegressor:def__init__(self):self.params=Nonedeffit(self,X:np.ndarray,y:np.ndarray):r,_=X.shapeX=np.hstack([np.ones((r,1)),X])self.params=np.linalg.inv(X.T@X)@X.T@yreturnselfdefpredict(self,X:np.ndarray):r,_=X.shapeX=np.hstack([np.ones((r,1)),X])returnX@self.paramsif__name__=="__main__":regressor=RANSAC(model=LinearRegressor(),loss=square_error_loss,metric=mean_square_error)X=np.array([-0.848,-0.800,-0.704,-0.632,-0.488,-0.472,-0.368,-0.336,-0.280,-0.200,-0.00800,-0.0840,0.0240,0.100,0.124,0.148,0.232,0.236,0.324,0.356,0.368,0.440,0.512,0.548,0.660,0.640,0.712,0.752,0.776,0.880,0.920,0.944,-0.108,-0.168,-0.720,-0.784,-0.224,-0.604,-0.740,-0.0440,0.388,-0.0200,0.752,0.416,-0.0800,-0.348,0.988,0.776,0.680,0.880,-0.816,-0.424,-0.932,0.272,-0.556,-0.568,-0.600,-0.716,-0.796,-0.880,-0.972,-0.916,0.816,0.892,0.956,0.980,0.988,0.992,0.00400]).reshape(-1,1)y=np.array([-0.917,-0.833,-0.801,-0.665,-0.605,-0.545,-0.509,-0.433,-0.397,-0.281,-0.205,-0.169,-0.0531,-0.0651,0.0349,0.0829,0.0589,0.175,0.179,0.191,0.259,0.287,0.359,0.395,0.483,0.539,0.543,0.603,0.667,0.679,0.751,0.803,-0.265,-0.341,0.111,-0.113,0.547,0.791,0.551,0.347,0.975,0.943,-0.249,-0.769,-0.625,-0.861,-0.749,-0.945,-0.493,0.163,-0.469,0.0669,0.891,0.623,-0.609,-0.677,-0.721,-0.745,-0.885,-0.897,-0.969,-0.949,0.707,0.783,0.859,0.979,0.811,0.891,-0.137]).reshape(-1,1)regressor.fit(X,y)importmatplotlib.pyplotaspltplt.style.use("seaborn-darkgrid")fig,ax=plt.subplots(1,1)ax.set_box_aspect(1)plt.scatter(X,y)line=np.linspace(-1,1,num=100).reshape(-1,1)plt.plot(line,regressor.predict(line),c="peru")plt.show()
Result of running the RANSAC implementation. The orange line shows the least-squares parameters found by the iterative approach, which successfully ignores the outlier points. RANSAC applied to 2D regression problem.png
Result of running the RANSAC implementation. The orange line shows the least-squares parameters found by the iterative approach, which successfully ignores the outlier points.

Parameters

The threshold value to determine when a data point fits a model (t), and the number of inliers (data points fitted to the model within t) required to assert that the model fits well to data (d) are determined based on specific requirements of the application and the dataset, and possibly based on experimental evaluation. The number of iterations (k), however, can be roughly determined as a function of the desired probability of success (p) as shown below.

Let p be the desired probability that the RANSAC algorithm provides at least one useful result after running. In extreme (for simplifying the derivation), RANSAC returns a successful result if in some iteration it selects only inliers from the input data set when it chooses n points from the data set from which the model parameters are estimated. (In other words, all the selected n data points are inliers of the model estimated by these points). Let be the probability of choosing an inlier each time a single data point is selected, that is roughly,

= number of inliers in data / number of points in data

A common case is that is not well known beforehand because of an unknown number of inliers in data before running the RANSAC algorithm, but some rough value can be given. With a given rough value of and roughly assuming that the n points needed for estimating a model are selected independently (It is a rough assumption because each data point selection reduces the number of data point candidates to choose in the next selection in reality), is the probability that all n points are inliers and is the probability that at least one of the n points is an outlier, a case which implies that a bad model will be estimated from this point set. That probability to the power of k (the number of iterations in running the algorithm) is the probability that the algorithm never selects a set of n points which all are inliers, and this is the same as (the probability that the algorithm does not result in a successful model estimation) in extreme. Consequently,

which, after taking the logarithm of both sides, leads to

This result assumes that the n data points are selected independently, that is, a point which has been selected once is replaced and can be selected again in the same iteration. This is often not a reasonable approach and the derived value for k should be taken as an upper limit in the case that the points are selected without replacement. For example, in the case of finding a line which fits the data set illustrated in the above figure, the RANSAC algorithm typically chooses two points in each iteration and computes maybe_model as the line between the points and it is then critical that the two points are distinct.

To gain additional confidence, the standard deviation or multiples thereof can be added to k. The standard deviation of k is defined as

Advantages and disadvantages

An advantage of RANSAC is its ability to do robust estimation [3] of the model parameters, i.e., it can estimate the parameters with a high degree of accuracy even when a significant number of outliers are present in the data set. A disadvantage of RANSAC is that there is no upper bound on the time it takes to compute these parameters (except exhaustion). When the number of iterations computed is limited, the solution obtained may not be optimal, and it may not even be one that fits the data in a good way. In this way RANSAC offers a trade-off; by computing a greater number of iterations, the probability of a reasonable model being produced is increased. Moreover, RANSAC is not always able to find the optimal set even for moderately contaminated sets, and it usually performs badly when the number of inliers is less than 50%. Optimal RANSAC [4] was proposed to handle both these problems and is capable of finding the optimal set for heavily contaminated sets, even for an inlier ratio under 5%. Another disadvantage of RANSAC is that it requires the setting of problem-specific thresholds.

RANSAC can only estimate one model for a particular data set. As for any one-model approach when two (or more) model instances exist, RANSAC may fail to find either one. The Hough transform is one alternative robust estimation technique that may be useful when more than one model instance is present. Another approach for multi-model fitting is known as PEARL, [5] which combines model sampling from data points as in RANSAC with iterative re-estimation of inliers and the multi-model fitting being formulated as an optimization problem with a global energy function describing the quality of the overall solution.

Applications

The RANSAC algorithm is often used in computer vision, e.g., to simultaneously solve the correspondence problem and estimate the fundamental matrix related to a pair of stereo cameras; see also: Structure from motion, scale-invariant feature transform, image stitching, rigid motion segmentation.

Development and improvements

Since 1981 RANSAC has become a fundamental tool in the computer vision and image processing community. In 2006, for the 25th anniversary of the algorithm, a workshop was organized at the International Conference on Computer Vision and Pattern Recognition (CVPR) to summarize the most recent contributions and variations to the original algorithm, mostly meant to improve the speed of the algorithm, the robustness and accuracy of the estimated solution and to decrease the dependency from user defined constants.

RANSAC can be sensitive to the choice of the correct noise threshold that defines which data points fit a model instantiated with a certain set of parameters. If such threshold is too large, then all the hypotheses tend to be ranked equally (good). On the other hand, when the noise threshold is too small, the estimated parameters tend to be unstable ( i.e. by simply adding or removing a datum to the set of inliers, the estimate of the parameters may fluctuate). To partially compensate for this undesirable effect, Torr et al. proposed two modification of RANSAC called MSAC (M-estimator SAmple and Consensus) and MLESAC (Maximum Likelihood Estimation SAmple and Consensus). [6] The main idea is to evaluate the quality of the consensus set ( i.e. the data that fit a model and a certain set of parameters) calculating its likelihood (whereas in the original formulation by Fischler and Bolles the rank was the cardinality of such set). An extension to MLESAC which takes into account the prior probabilities associated to the input dataset is proposed by Tordoff. [7] The resulting algorithm is dubbed Guided-MLESAC. Along similar lines, Chum proposed to guide the sampling procedure if some a priori information regarding the input data is known, i.e. whether a datum is likely to be an inlier or an outlier. The proposed approach is called PROSAC, PROgressive SAmple Consensus. [8]

Chum et al. also proposed a randomized version of RANSAC called R-RANSAC [9] to reduce the computational burden to identify a good consensus set. The basic idea is to initially evaluate the goodness of the currently instantiated model using only a reduced set of points instead of the entire dataset. A sound strategy will tell with high confidence when it is the case to evaluate the fitting of the entire dataset or when the model can be readily discarded. It is reasonable to think that the impact of this approach is more relevant in cases where the percentage of inliers is large. The type of strategy proposed by Chum et al. is called preemption scheme. Nistér proposed a paradigm called Preemptive RANSAC [10] that allows real time robust estimation of the structure of a scene and of the motion of the camera. The core idea of the approach consists in generating a fixed number of hypotheses so that the comparison happens with respect to the quality of the generated hypothesis rather than against some absolute quality metric.

Other researchers tried to cope with difficult situations where the noise scale is not known and/or multiple model instances are present. The first problem has been tackled in the work by Wang and Suter. [11] Toldo et al. represent each datum with the characteristic function of the set of random models that fit the point. Then multiple models are revealed as clusters which group the points supporting the same model. The clustering algorithm, called J-linkage, does not require prior specification of the number of models, nor does it necessitate manual parameters tuning. [12]

RANSAC has also been tailored for recursive state estimation applications, where the input measurements are corrupted by outliers and Kalman filter approaches, which rely on a Gaussian distribution of the measurement error, are doomed to fail. Such an approach is dubbed KALMANSAC. [13]

See also

Notes

  1. Data Fitting and Uncertainty, T. Strutz, Springer Vieweg (2nd edition, 2016).
  2. Cantzler, H. "Random Sample Consensus (RANSAC)". Institute for Perception, Action and Behaviour, Division of Informatics, University of Edinburgh. CiteSeerX   10.1.1.106.3035 . Archived from the original on 2023-02-04.
  3. Robust Statistics, Peter. J. Huber, Wiley, 1981 (republished in paperback, 2004), page 1.
  4. Anders Hast, Johan Nysjö, Andrea Marchetti (2013). "Optimal RANSAC – Towards a Repeatable Algorithm for Finding the Optimal Set". Journal of WSCG 21 (1): 21–30.
  5. Hossam Isack, Yuri Boykov (2012). "Energy-based Geometric Multi-Model Fitting". International Journal of Computer Vision 97 (2: 1): 23–147. doi : 10.1007/s11263-011-0474-7.
  6. P.H.S. Torr and A. Zisserman, MLESAC: A new robust estimator with application to estimating image geometry [ dead link ], Journal of Computer Vision and Image Understanding 78 (2000), no. 1, 138–156.
  7. B. J. Tordoff and D. W. Murray, Guided-MLESAC: Faster image transform estimation by using matching priors, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005), no. 10, 1523–1535.
  8. Matching with PROSAC – progressive sample consensus, Proceedings of Conference on Computer Vision and Pattern Recognition (San Diego), vol. 1, June 2005, pp. 220–226
  9. O. Chum and J. Matas, Randomized RANSAC with Td,d test, 13th British Machine Vision Conference, September 2002. http://www.bmva.org/bmvc/2002/papers/50/
  10. D. Nistér, Preemptive RANSAC for live structure and motion estimation, IEEE International Conference on Computer Vision (Nice, France), October 2003, pp. 199–206.
  11. H. Wang and D. Suter, Robust adaptive-scale parametric model estimation for computer vision., IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2004), no. 11, 1459–1474
  12. R. Toldo and A. Fusiello, Robust multiple structures estimation with J-linkage, European Conference on Computer Vision (Marseille, France), October 2008, pp. 537–547.
  13. A. Vedaldi, H. Jin, P. Favaro, and S. Soatto, KALMANSAC: Robust filtering by consensus, Proceedings of the International Conference on Computer Vision (ICCV), vol. 1, 2005, pp. 633–640
  14. Brahmachari, Aveek S.; Sarkar, Sudeep (March 2013). "Hop-Diffusion Monte Carlo for Epipolar Geometry Estimation between Very Wide-Baseline Images". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (3): 755–762. doi:10.1109/TPAMI.2012.227. PMID   26353140. S2CID   2524656.
  15. W. Ruoyan and W. Junfeng, "FSASAC: Random Sample Consensus Based on Data Filter and Simulated Annealing," in IEEE Access, vol. 9, pp. 164935-164948, 2021, doi: 10.1109/ACCESS.2021.3135416.

Related Research Articles

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.

<span class="mw-page-title-main">Outlier</span> Observation far apart from others in statistics and data science

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses.

<span class="mw-page-title-main">Cross-validation (statistics)</span> Statistical model validation technique

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

<span class="mw-page-title-main">Image stitching</span> Combining multiple photographic images with overlapping fields of view

Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally.

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

  1. Permutation tests
  2. Bootstrapping
  3. Cross validation
  4. Jackknife
<span class="mw-page-title-main">Local regression</span> Moving average and polynomial regression method for smoothing data

Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are LOESS and LOWESS, both pronounced LOH-ess. They are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. In some fields, LOESS is known and commonly referred to as Savitzky–Golay filter.

Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing.

Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based on minimizing the sum of absolute deviations or the L1 norm of such values. It is analogous to the least squares technique, except that it is based on absolute values instead of squared values. It attempts to find a function which closely approximates a set of data by minimizing residuals between points generated by the function and corresponding data points. The LAD estimate also arises as the maximum likelihood estimate if the errors have a Laplace distribution. It was introduced in 1757 by Roger Joseph Boscovich.

<span class="mw-page-title-main">Point Cloud Library</span> Open-source algorithm library

The Point Cloud Library (PCL) is an open-source library of algorithms for point cloud processing tasks and 3D geometry processing, such as occur in three-dimensional computer vision. The library contains algorithms for filtering, feature estimation, surface reconstruction, 3D registration, model fitting, object recognition, and segmentation. Each module is implemented as a smaller library that can be compiled separately. PCL has its own data format for storing point clouds - PCD, but also allows datasets to be loaded and saved in many other formats. It is written in C++ and released under the BSD license.

<span class="mw-page-title-main">Theil–Sen estimator</span> Statistical method for fitting a line

In non-parametric statistics, the Theil–Sen estimator is a method for robustly fitting a line to sample points in the plane by choosing the median of the slopes of all lines through pairs of points. It has also been called Sen's slope estimator, slope selection, the single median method, the Kendall robust line-fit method, and the Kendall–Theil robust line. It is named after Henri Theil and Pranab K. Sen, who published papers on this method in 1950 and 1968 respectively, and after Maurice Kendall because of its relation to the Kendall tau rank correlation coefficient.

<span class="mw-page-title-main">Point-set registration</span> Process of finding a spatial transformation that aligns two point clouds

In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model, and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging.

Spatial verification is a technique in which similar locations can be identified in an automated way through a sequence of images. The general method involves identifying a correlation between certain points among sets images, using techniques similar to those used for image registration.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Perspective-n-Point is the problem of estimating the pose of a calibrated camera given a set of n 3D points in the world and their corresponding 2D projections in the image. The camera pose consists of 6 degrees-of-freedom (DOF) which are made up of the rotation and 3D translation of the camera with respect to the world. This problem originates from camera calibration and has many applications in computer vision and other areas, including 3D pose estimation, robotics and augmented reality. A commonly used solution to the problem exists for n = 3 called P3P, and many solutions are available for the general case of n ≥ 3. A solution for n = 2 exists if feature orientations are available at the two points. Implementations of these solutions are also available in open source software.

In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.

Isolation Forest is an algorithm for data anomaly detection using binary trees. It was developed by Fei Tony Liu in 2008. It has a linear time complexity and a low memory use, which works well for high-volume data. It is based on the assumption that because anomalies are few and different from other data, they can be isolated using few partitions. Like decision tree algorithms, it does not perform density estimation. Unlike decision tree algorithms, it uses only path length to output an anomaly score, and does not use leaf node statistics of class distribution or target value.

References