Jump to content

Comparison of datasets in machine learning

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Kri (talk | contribs) at 16:47, 7 February 2016 (Created comparison page). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

The following tables compare some of the datasets that can be used in machine learning for training and testing.

Image datasets

Dataset Creator Free License[a] Description Number of examples (training + test) Number of categories Number of annotations Size (MB) Web page
Caltech 101 Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology Yes ? Pictures of objects 9,146 101 131 [1]
Caltech 256 ? Yes ? Pictures of objects 30,607 256 1,128 [2]
LabelMe MIT Computer Science and Artificial Intelligence Laboratory Yes ? Pictures of scenes 187,240 658,992 ? [3]
MNIST database ? Yes ? Handwritten digits 60,000 10 11 [4]
Overhead Imagery Research Data Set ? Yes ? Handwritten digits 908 1,800 (approx.) 161 [5]

Facial image datasets

Sound datasets

Dataset Creator Free License[a] Description Number of examples (training + test) Size (MB) Web page
TIMIT John Garofolo, Lori Lamel, William Fisher, Jonathan Fiscus, David Pallett, Nancy Dahlgren, Victor Zue No LDC User Agreement for Non-Members Recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences 6,300 ? [6]

Footnotes

  1. ^ a b Licenses here are a summary, and are not taken to be complete statements of the licenses.

See also

References