Jump to content

Cluster analysis

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Kku (talk | contribs) at 12:40, 21 May 2004 (copy from disambiguation page). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Data clustering is a common technique for data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Clustering consists of partitioning a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often similarity or proximity for some defined distance measure.


Data clustering algorithms can be hierarchical or partitional, and hierarchical algorithms can be agglomerative (bottom-up) or divisive (top-down).

Applications

In biology has two main applications in the fields of computational biology and bioinformatics.

References

  • Greg Pfister: In Search of Clusters, Prentice Hall, ISBN 0138997098
  • Jain, Murty and Flynn: Data Clustering: A Review, ACM Comp. Surv, 1999

See also

k-means, ANN