Random sample consensus

RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an algorithm to improve a set of data values according to some model of agreement.

Overview

The input to the RANSAC algorithm is a set of data values, a model to judge the agreement between data and some confidence parameters.

RANSAC archieves its goal by iteratively selecting a random subset of the original data values. For this set the model is fit, that is, all free parameters of the model are reconstructed from the set. All other data values are tested against the fitted model, that is, for every data value of this test set it is determined how "well" it fits into the fitted model.

Algorithm

The generic RANSAC algorithm works as follows:

Given:
    input - set of input data
    model - model that can test and can be fit
    n - minimum number of data values required to fit the model
    k - the number of iterations required
    t - threshold value for a positive single data element fit
    d - the number of close data values required to assert a model fits well
Return:
    bestfit - set of input data that fits the model best
iterations = 0
goodfits = []
while iterations < k
    randomly select n data values from input, call that set "fitset"
    fit model to "fitset" and call that model "fitmodel"
    closeset = []
    for all data values outsideval in input that are not in "fitset"
        if error of outsideval in regard to fitmodel is smaller than t
            closepoints = closepoints + [outsideval]
    if the number of elements in closepoints is equal to or larger than d
        refit the fitmodel using all points in closepoints and fitset
        goodfits = goodfits + [fitmodel]
return the best fitted model in goodfits as bestfit or none if there is none

While the parameter values of $t$ and $d$ has to be calculated from the individual requirements it can be experimentally determined. The interesting parameter of the RANSAC algorithm is $k$ .

To calculate the parameter $k$ given the known probability $w$ of a good data value, the probability $z$ of seeing only bad data values is used:

z=\left(1-w^{n}\right)^{k}

which leads to

k={\frac {\log(z)}{\log(1-w^{n})}}

To gain additional confidence, the standard deviation or multiples thereof can be added to $k$ . The standard deviation of $k$ is defined as

SD(x)={\frac {\sqrt {1-w^{n}}}{w^{n}}}

A common case is that $w$ is not well known beforehand, but some rough value can be given. If $n$ data values are given, the probability of success is $w^{n}$ .

Applications

The RANSAC algorithm is often used in Computer Vision problems to discard a small percentage of outliers from a sample data set to improve results using later algorithms. A simple example is line fitting to a set of 2-dimensional points.

References

David A. Forsyth, Jean Ponce (2003). Computer Vision, a modern approach. Prentice Hall. ISBN 7-302-07795-9