Jump to content

Oversampling and undersampling in data analysis

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Winterstein (talk | contribs) at 13:22, 23 March 2009 (created stub). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented). Oversampling and undersampling are opposite and roughly equivalent techniques. They both involve using a bias to select more samples from one class than from another.

See also