Jump to content

Nearest centroid classifier

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Citation bot (talk | contribs) at 09:49, 11 September 2021 (Add: doi-access. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 13/654). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Rocchio Classification

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.

When applied to text classification using tf*idf vectors to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]

Algorithm

  • Training procedure: given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to class .
  • Prediction function: the class assigned to an observation is .

See also

References

  1. ^ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press.
  2. ^ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences. 99 (10): 6567–6572. doi:10.1073/pnas.082099299. PMC 124443. PMID 12011421.