Talk:Hierarchical clustering

This is the talk page for discussing improvements to the Hierarchical clustering article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

This article has not yet been rated on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Statistics C‑class Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
C	This article has been rated as C-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the importance scale.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Computer science C‑class Mid‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

C

This article has been rated as C-class on Wikipedia's content assessment scale.

Mid

This article has been rated as Mid-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Databases (inactive)

This article is within the scope of WikiProject Databases, a project which is currently considered to be inactive.DatabasesWikipedia:WikiProject DatabasesTemplate:WikiProject DatabasesDatabases

Text and/or other creative content from Cluster analysis was copied or moved into Hierarchical clustering with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

This is the "main article" for Hierarchical clustering according to the Cluster_analysis page, yet that page actually has more information than this one about Hierarchical clustering algorithms. Surely this should not be. . . Electron100 (talk) 03:05, 21 September 2009 (UTC)[reply]

While I agree that this article is seriously lacking information I believe this page needs to be enhanced rather than merged. Using K-means clustering as an example the cluster analysis page gives an overview, but the main article provides more detailed information. (Humanpowered (talk) 15:30, 23 March 2011 (UTC))[reply]

Added WikiLink to User:Mathstat/Ward's_method — Preceding unsigned comment added by Jmajf (talk • contribs) 12:49, 28 November 2011 (UTC)[reply]

give example

Dear Sir Please write fluent and understandable about several kind of hierarchical clustering and please give example. — Preceding unsigned comment added by 83.172.123.165 (talk) 19:04, 16 December 2011 (UTC)[reply]

In the section "Metric" am I right that the "i" across which some of the distance metrics are summed is an index of data dimension? i.e. bivariate data will be i = {1, 2}.

If so it might make it clearer to put this definition of i in the text to make it clear to simpletons like me! Also two of the measures (Mahalanobis and cosine) do not sum across i. Does this mean they can only be used for single variate data? If not, is there another formula? — Preceding unsigned comment added by Periololon (talk • contribs) 14:44, 19 March 2012 (UTC)[reply]

V-linkage V-means

I was interested in this technique but I haven't found any reference, searching Google, Google Scholar. We need a source/reference.Moo (talk) 20:25, 11 May 2012 (UTC)[reply]

I found the following on what appears to be an old copy of the artcle cluster analysis at http://biocomp.bioen.uiuc.edu/oscar/tools/Hierarchical_Clustering.html

V-means clustering

V-means clustering utilizes cluster analysis and nonparametric statistical tests to key researchers into segments of data that may contain distinct homogenous sub-sets. The methodology embraced by V-means clustering circumvents many of the problems that traditionally beleaguer standard techniques for categorizing data. First, instead of relying on analyst predictions for the number of distinct sub-sets (k-means clustering), V-means clustering generates a pareto optimal number of sub-sets. V-means clustering is calibrated to a user-defined confidence level p, whereby the algorithm divides the data and then recombines the resulting groups until the probability that any given group belongs to the same distribution as either of its neighbors is less than p.

Second, V-means clustering makes use of repeated iterations of the nonparametric Kolmogorov-Smirnov test. Standard methods of dividing data into its constituent parts are often entangled in definitions of distances (distance measure clustering) or in assumptions about the normality of the data (expectation maximization clustering), but nonparametric analysis draws inference from the distribution functions of sets.

Third, the method is conceptually simple. Some methods combine multiple techniques in sequence in order to produce more robust results. From a practical standpoint this muddles the meaning of the results and frequently leads to conclusions typical of “data dredging.”

Unfortunately there was no citation. Melcombe (talk) 22:27, 11 May 2012 (UTC)[reply]

Hierarchical Clustering References

There is a 1967 paper, published in Psychometrika, titled "Hierarchical Clustering Schemes", by S. C. Johnson (yes, that's me...). It was extensively cited in the 70's and 80's, in part because Bell Labs gave away a FORTRAN program for free that did a couple of the methods described in the paper. The paper pointed out that there is a correspondence between hierarical clusterings and a kind of data metric called an ultrametric -- whenever you have a hierarchical clustering, it implies an ultrametic, and conversely. 76.244.36.165 (talk) 19:14, 18 October 2012 (UTC) Stephen C Johnson[reply]

Example for Agglomerative Clustering edit

I changed The "increase" in variance for the cluster being merged (Ward's method[7]) to The "decrease" in variance for the cluster being merged (Ward's method[7]). So it is also above, to Cluster dissimilarity and so appears from Ward's method, https://en.wikipedia.org/wiki/Ward%27s_method

 — Preceding unsigned comment added by 2A02:5D8:200:600:82:150:200:4 (talk) 11:44, 20 August 2015 (UTC)[reply]

Divisive algorithms, hierarchical k-means

I think that hierarchical k-means deserves a mention or description, maybe even it's own page. As a starting point I'm mentioning it here. Perhaps the way to do is Hierarchical clustering#(agglomerative methods#(...),divisive#(hierarchical-kmeans,..others..)) Fmadd (talk) 06:45, 14 May 2016 (UTC)[reply]