Clustering coefficient

In graph theory, a clustering coefficient is a measure of degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties (Holland and Leinhardt, 1971^[1]; Watts and Strogatz, 1998^[2]). In real-world networks, this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971; Watts and Strogatz, 1998).

Two versions of this measure exist: the global and the local. The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the embeddedness of single nodes.

The global clustering coefficient

The global clustering coefficient is based on triplets of nodes. A triplet is three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. A triangle consists of three closed triplets, one centred on each of the nodes. The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed). The first attempt to measure it was made by Luce and Perry (1949)^[3]. This measure gives an indication of the clustering in the whole network (global), and can be applied to both undirected and directed networks (often called transitivity, see Wasserman and Faust, 1994, page 243^[4]).

Formally, it has been defined as:

C={\frac {3\times {\mbox{number of triangles}}}{\mbox{number of connected triples of vertices}}}={\frac {\mbox{number of closed triplets}}{\mbox{number of connected triples of vertices}}}.

A generalisation to weighted networks was proposed by Opsahl and Panzarasa (2009)^[5], and a redefinition to two-mode networks (both binary and weighted) by Opsahl (2009)^[6].

Local clustering coefficient

The local clustering coefficient of a vertex in a graph quantifies how close its neighbors are to being a clique (complete graph). Duncan J. Watts and Steven Strogatz introduced the measure in 1998 to determine whether a graph is a small-world network.

A graph $G=(V,E)$ formally consists of a set of vertices $V$ and a set of edges $E$ between them. An edge $e_{ij}$ connects vertex $i$ with vertex $j$ .

The neighbourhood N for a vertex $v_{i}$ is defined as its immediately connected neighbours as follows:

N_{i}=\{v_{j}:e_{ij}\in E\land e_{ji}\in E\}.

The degree $k_{i}$ of a vertex is defined as the number of vertices, $|N_{i}|$ , in its neighbourhood $N_{i}$ .

The local clustering coefficient $C_{i}$ for a vertex $v_{i}$ is then given by the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph, $e_{ij}$ is distinct from $e_{ji}$ , and therefore for each neighbourhood $N_{i}$ there are $k_{i}(k_{i}-1)$ links that could exist among the vertices within the neighbourhood ( $k_{i}$ is the total (in + out) degree of the vertex). Thus, the local clustering coefficient for directed graphs is given as

C_{i}={\frac {|\{e_{jk}\}|}{k_{i}(k_{i}-1)}}:v_{j},v_{k}\in N_{i},e_{jk}\in E.

An undirected graph has the property that $e_{ij}$ and $e_{ji}$ are considered identical. Therefore, if a vertex $v_{i}$ has $k_{i}$ neighbours, ${\frac {k_{i}(k_{i}-1)}{2}}$ edges could exist among the vertices within the neighbourhood. Thus, the local clustering coefficient for undirected graphs can be defined as

C_{i}={\frac {2|\{e_{jk}\}|}{k_{i}(k_{i}-1)}}:v_{j},v_{k}\in N_{i},e_{jk}\in E.

Let $\lambda _{G}(v)$ be the number of triangles on $v\in V(G)$ for undirected graph $G$ . That is, $\lambda _{G}(v)$ is the number of subgraphs of $G$ with 3 edges and 3 vertices, one of which is $v$ . Let $\tau _{G}(v)$ be the number of triples on $v\in G$ . That is, $\tau _{G}(v)$ is the number of subgraphs (not necessarily induced) with 2 edges and 3 vertices, one of which is $v$ and such that $v$ is incident to both edges. Then we can also define the clustering coefficient as

C_{i}={\frac {\lambda _{G}(v)}{\tau _{G}(v)}}.

It is simple to show that the two preceding definitions are the same, since

\tau _{G}(v)=C({k_{i}},2)={\frac {1}{2}}k_{i}(k_{i}-1).

These measures are 1 if every neighbour connected to $v_{i}$ is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to $v_{i}$ connects to any other vertex that is connected to $v_{i}$ .

Network average clustering coefficient

The clustering coefficient for the whole network is given by Watts and Strogatz ^[2] as the average of the clustering coefficients of all the vertices $n$ :

{\bar {C}}={\frac {1}{n}}\sum _{i=1}^{n}C_{i}.

A graph is considered small-world, if its average local clustering coefficient ${\bar {C}}$ is significantly higher than a random graph constructed on the same vertex set, and if the graph has a short mean-shortest path length.

A generalisation to weighted networks was proposed by Barrat et al. (2004)^[7], and a redefinition to two-mode networks (both binary and weighted) by Opsahl (2009)^[8].

References

^ P. W. Holland and S. Leinhardt (1998). "Transitivity in structural models of small groups". Comparative Group Studies. 2: 107–124.
^ ^a ^b D. J. Watts and Steven Strogatz (1998). "Collective dynamics of 'small-world' networks" (PDF). Nature. 393: 440–442. doi:10.1038/30918. {{cite journal}}: Unknown parameter |month= ignored (help)
^ R. D. Luce and A. D. Perry (1949). "A method of matrix analysis of group structure". Psychometrika. 14 (1): 95–116. doi:10.1007/BF02289146.
^ Stanley Wasserman, Kathrine Faust, 1994. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press.
^ Tore Opsahl and Pietro Panzarasa (2009). "Clustering in Weighted Networks". Social Networks. 31 (2): 155–163. doi:10.1016/j.socnet.2009.02.002.
^ Tore Opsahl (2009). "Clustering in Two-mode Networks". Conference and Workshop on Two-Mode Social Analysis (Sept 30-Oct 2, 2009).
^ A. Barrat and M. Barthelemy and R. Pastor-Satorras and A. Vespignani (2004). "The architecture of complex weighted networks". Proceedings of the National Academy of Sciences. 101 (11): 3747–3752. doi:10.1073/pnas.0400087101.
^ Tore Opsahl (2009). "Clustering in Two-mode Networks". Conference and Workshop on Two-Mode Social Analysis (Sept 30-Oct 2, 2009).

[1] P. W. Holland and S. Leinhardt (1998). "Transitivity in structural models of small groups". Comparative Group Studies. 2: 107–124.

[WattsStrogatz1998-2] D. J. Watts and Steven Strogatz (1998). "Collective dynamics of 'small-world' networks" (PDF). Nature. 393: 440–442. doi:10.1038/30918. {{cite journal}}: Unknown parameter |month= ignored (help)

[3] R. D. Luce and A. D. Perry (1949). "A method of matrix analysis of group structure". Psychometrika. 14 (1): 95–116. doi:10.1007/BF02289146.

[4] Stanley Wasserman, Kathrine Faust, 1994. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press.

[5] Tore Opsahl and Pietro Panzarasa (2009). "Clustering in Weighted Networks". Social Networks. 31 (2): 155–163. doi:10.1016/j.socnet.2009.02.002.

[6] Tore Opsahl (2009). "Clustering in Two-mode Networks". Conference and Workshop on Two-Mode Social Analysis (Sept 30-Oct 2, 2009).

[7] A. Barrat and M. Barthelemy and R. Pastor-Satorras and A. Vespignani (2004). "The architecture of complex weighted networks". Proceedings of the National Academy of Sciences. 101 (11): 3747–3752. doi:10.1073/pnas.0400087101.

[8] Tore Opsahl (2009). "Clustering in Two-mode Networks". Conference and Workshop on Two-Mode Social Analysis (Sept 30-Oct 2, 2009).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]