Jump to content

Nearest centroid classifier

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 12:11, 28 March 2012 (Created page with 'In machine learning, a '''nearest centroid''' or '''nearest prototype classifier''' is a classification model that assigns to ...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In machine learning, a nearest centroid or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.

When applied to text classification using tf*idf vectors to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]

Algorithm

  • Training procedure: given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to set .
  • Prediction function: the class assigned to an observation is

See also

References

  1. ^ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "14". Introduction to Information Retrieval.
  2. ^ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). Proceedings of the National Academy of Sciences. 99 (10). {{cite journal}}: Missing or empty |title= (help)