Comparison of datasets in machine learning

The following tables compare some of the datasets that can be used in machine learning for training and testing.

Image datasets

Dataset	Creator	Free	License^[a]	Description	Number of examples (training + test)	Number of categories	Number of annotations	Size (MB)	Web page
Caltech 101	Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology	Yes	?	Pictures of objects	9,146	101	—	131	[1]
Caltech 256	?	Yes	?	Pictures of objects	30,607	256	—	1,128	[2]
LabelMe	MIT Computer Science and Artificial Intelligence Laboratory	Yes	?	Pictures of scenes	187,240	—	658,992	?	[3]
MNIST database	?	Yes	?	Handwritten digits	60,000	10	—	11	[4]
Overhead Imagery Research Data Set	?	Yes	?	Handwritten digits	908	—	1,800 (approx.)	161	[5]

Dataset	Creator	Free	License^[a]	Description	Number of examples (training + test)	Size (MB)	Web page
TIMIT	John Garofolo, Lori Lamel, William Fisher, Jonathan Fiscus, David Pallett, Nancy Dahlgren, Victor Zue	No	LDC User Agreement for Non-Members	Recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences	6,300	?	[6]

^ ^a ^b Licenses here are a summary, and are not taken to be complete statements of the licenses.