Correlation Clustering

Correlation Clustering operates in a scenario where the relationship between the objects is known instead of the actual representation of the objects. For example, given a fully connected graph $G=(V,E)$ where the edge label indicates whether two nodes are similar (+) or different (-), the task is to cluster the vertices such that similar objects are grouped together. Unlike other clustering algorithms this doesn't require the number of clusters $k$ to be specified because the objective is to maximize the disagreements and is independent of the number of clusters.

Notice that, it may not be possible to find a perfect clustering i.e. all the similar items are in a cluster while all dissimilar vertices are in different clusters. If the graph indeed admits a perfect clustering, then simply deleting all the negative edges and finding the connected components in the remaining graph will return the required clusters.

But, in general a graph may not have a perfect clustering. For example, given nodes {a,b,c} such that (a,b) and (a,c) are similar while (b,c) are dissimilar a perfect clustering is not possible. In such cases, the task is to find a clustering that maximizes the number of agreements (number of + edges inside clusters plus the number of - edges between clusters) or minimizes the number of disagreements (the number of - edges inside clusters plus the number of + edges between clusters). This problem of maximizing the agreements is NP-complete ( Multiway cut problem reduces to maximizing weighted agreements and the problem of partitioning into triangles^[1] can be reduced to unweighted version)

Bansal et. al.^[2] discusses the NP-completeness proof and also presents both a constant factor approximation algorithm and polynomial-time approximation scheme to find the clusters in this setting. Nir et. al.^[3] proposes a randomized 3-approximation algorithm for the same problem.

CC-Pivot(G=(V,E⁺,E^-))

   Pick random pivot i ∈ V
   Set  $C=\{i\}$ , V'=Ø
   For all j ∈ V, j ≠ i;
       If (i,j) ∈ E⁺ then
            Add j to C
       Else (If (i,j) ∈ E^-)
            Add j to V'
   Let G' be the subgraph induced by V'
   Return clustering C,CC-Pivot(G')

The authors show that the above algorithm is a 3-approximation algorithm for correlation clustering.

References

^ Garey, M. and Johnson, D. (2000). Computers and Intractability: A Guide to the Theory of NP-Completeness. {{cite conference}}: Unknown parameter |organization= ignored (help)CS1 maint: multiple names: authors list (link)
^ Bansal, N., Blum, A. and Chawla, S. (2004). "Correlation Clustering". Machine Learning Journal (Special Issue on Theoritical Advances in Data Clustering,. pp. 86–113, . {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)
^ Ailon, Nir and Charikar, Moses and Newman, Alantha (2005). "Aggregating inconsistent information: ranking and clustering". STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing. pp. 684–693, . {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)

[1] Garey, M. and Johnson, D. (2000). Computers and Intractability: A Guide to the Theory of NP-Completeness. {{cite conference}}: Unknown parameter |organization= ignored (help)CS1 maint: multiple names: authors list (link)

[2] Bansal, N., Blum, A. and Chawla, S. (2004). "Correlation Clustering". Machine Learning Journal (Special Issue on Theoritical Advances in Data Clustering,. pp. 86–113, . {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)

[3] Ailon, Nir and Charikar, Moses and Newman, Alantha (2005). "Aggregating inconsistent information: ranking and clustering". STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing. pp. 684–693, . {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]