Jump to content

Comparison of datasets in machine learning

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Kri (talk | contribs) at 13:20, 26 February 2016 (Merge -> Merge to). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The following tables compare some of the datasets that can be used in machine learning for training and testing.

Image datasets by name

General image datasets

Dataset Creator Free License[a] Description Number of examples (training + test) Number of categories Number of annotations Size (MB) Website
Caltech 101 Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology Yes ? Pictures of objects 9,146 101 131 [1]
Caltech 256 ? Yes ? Pictures of objects 30,607 256 1,128 [2]
ImageNet Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei Stanford University Yes Varied - Image URLs Images for WordNet nouns 14,197,122 1000 25 ? [3]
LabelMe MIT Computer Science and Artificial Intelligence Laboratory Yes ? Pictures of scenes 187,240 658,992 ? [4]
MNIST database ? Yes ? Handwritten digits 60,000 10 11 [5]
MSCOCO Common Objects in Context Tsung-Yi Lin et. al Microsoft Research Yes Creative Commons for Image Annotations Images with multiple objects 325,000 2,500,000 5 captions per image (325k x 5 = 1.63M) ? [6]
Overhead Imagery Research Data Set ? Yes ? Overhead images 908 1,800 (approx.) 161 [7]

Facial image datasets

Sound datasets by name

Dataset Creator Free License[a] Description Number of examples (training + test) Size (MB) Website
TIMIT John Garofolo, Lori Lamel, William Fisher, Jonathan Fiscus, David Pallett, Nancy Dahlgren, Victor Zue No LDC User Agreement for Non-Members Recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences 6,300 ? [8]

Footnotes

  1. ^ a b Licenses here are a summary, and are not taken to be complete statements of the licenses.

See also

References