Nearest centroid classifier
Appearance
![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
An editor has nominated this article for deletion. You are welcome to participate in the deletion discussion, which will decide whether or not to retain it. |
In machine learning, a nearest centroid or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.
When applied to text classification using tf*idf vectors to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]
An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]
Algorithm
- Training procedure: given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to class .
- Prediction function: the class assigned to an observation is
See also
References
- ^ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press.
- ^ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences. 99 (10).