Modularity_(networks)

Modularity (networks)

Measure of network community structure

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.

Definition

Modularity is the fraction of the edges that fall within the given groups minus the expected fraction if edges were distributed at random. The value of the modularity for unweighted and undirected graphs lies in the range $[-1/2,1]$ .^[3] It is positive if the number of edges within groups exceeds the number expected on the basis of chance. For a given division of the network's vertices into some modules, modularity reflects the concentration of edges within modules compared with random distribution of links between all nodes regardless of modules.

There are different methods for calculating modularity.^[1] In the most common version of the concept, the randomization of the edges is done so as to preserve the degree of each vertex. Consider a graph with $n$ nodes and $m$ links (edges) such that the graph can be partitioned into two communities using a membership variable $s$ . If a node $v$ belongs to community 1, $s_{v}=1$ , or if $v$ belongs to community 2, $s_{v}=-1$ . Let the adjacency matrix for the network be represented by $A$ , where $A_{vw}=0$ means there's no edge (no interaction) between nodes $v$ and $w$ and $A_{vw}=1$ means there is an edge between the two. Also for simplicity we consider an undirected network. Thus $A_{vw}=A_{wv}$ . (It is important to note that multiple edges may exist between two nodes, but here we assess the simplest case).

Modularity $Q$ is then defined as the fraction of edges that fall within group 1 or 2, minus the expected number of edges within groups 1 and 2 for a random graph with the same node degree distribution as the given network.

The expected number of edges shall be computed using the concept of a configuration model.^[4] The configuration model is a randomized realization of a particular network. Given a network with $n$ nodes, where each node $v$ has a node degree $k_{v}$ , the configuration model cuts each edge into two halves, and then each half edge, called a stub, is rewired randomly with any other stub in the network, even allowing self-loops (which occur when a stub is rewired to another stub from the same node) and multiple-edges between the same two nodes. Thus, even though the node degree distribution of the graph remains intact, the configuration model results in a completely random network.

Expected Number of Edges Between Nodes

Now consider two nodes $v$ and $w$ , with node degrees $k_{v}$ and $k_{w}$ respectively, from a randomly rewired network as described above. We calculate the expected number of full edges between these nodes.

Let us consider each of the $k_{v}$ stubs of node $v$ and create associated indicator variables $I_{i}^{(v,w)}$ for them, $i=1,\ldots ,k_{v}$ , with $I_{i}^{(v,w)}=1$ if the $i$ -th stub happens to connect to one of the $k_{w}$ stubs of node $w$ in this particular random graph. If it does not, then $I_{i}^{(v,w)}=0$ . Since the $i$ -th stub of node $v$ can connect to any of the $2m-1$ remaining stubs with equal probability, and since there are $k_{w}$ stubs it can connect to associated with node $w$ , evidently

p(I_{i}^{(v,w)}=1)=E[I_{i}^{(v,w)}]={\frac {k_{w}}{2m-1}}

The total number of full edges $J_{vw}$ between $v$ and $w$ is just $J_{vw}=\sum _{i=1}^{k_{v}}I_{i}^{(v,w)}$ , so the expected value of this quantity is

E[J_{vw}]=E\left[\sum _{i=1}^{k_{v}}I_{i}^{(v,w)}\right]=\sum _{i=1}^{k_{v}}E[I_{i}^{(v,w)}]=\sum _{i=1}^{k_{v}}{\frac {k_{w}}{2m-1}}={\frac {k_{v}k_{w}}{2m-1}}

Many texts then make the following approximations, for random networks with a large number of edges. When $m$ is large, they drop the subtraction of $1$ in the denominator above and simply use the approximate expression ${\frac {k_{v}k_{w}}{2m}}$ for the expected number of edges between two nodes. Additionally, in a large random network, the number of self-loops and multi-edges is vanishingly small.^[5] Ignoring self-loops and multi-edges allows one to assume that there is at most one edge between any two nodes. In that case, $J_{vw}$ becomes a binary indicator variable, so its expected value is also the probability that it equals $1$ , which means one can approximate the probability of an edge existing between nodes $v$ and $w$ as ${\frac {k_{v}k_{w}}{2m}}$ .

Modularity

Hence, the difference between the actual number of edges between node $v$ and $w$ and the expected number of edges between them is

$A_{vw}-{\frac {k_{v}k_{w}}{2m}}$

Summing over all node pairs gives the equation for modularity, $Q$ .^[1]

Q={\frac {1}{2m}}\sum _{vw}\left[A_{vw}-{\frac {k_{v}k_{w}}{2m}}\right]{\frac {s_{v}s_{w}+1}{2}}

(3)

It is important to note that Eq. 3 holds good for partitioning into two communities only. Hierarchical partitioning (i.e. partitioning into two communities, then the two sub-communities further partitioned into two smaller sub communities only to maximize Q) is a possible approach to identify multiple communities in a network. Additionally, (3) can be generalized for partitioning a network into c communities.^[6]

Q={\frac {1}{(2m)}}\sum _{vw}\left[A_{vw}-{\frac {k_{v}k_{w}}{(2m)}}\right]\delta (c_{v},c_{w})=\sum _{i=1}^{c}(e_{ii}-a_{i}^{2})

(4)

where e_ij is the fraction of edges with one end vertices in community i and the other in community j:

e_{ij}=\sum _{vw}{\frac {A_{vw}}{2m}}1_{v\in c_{i}}1_{w\in c_{j}}

and a_i is the fraction of ends of edges that are attached to vertices in community i:

a_{i}={\frac {k_{i}}{2m}}=\sum _{j}e_{ij}

Matrix formulation

An alternative formulation of the modularity, useful particularly in spectral optimization algorithms, is as follows.^[1] Define $S_{vr}$ to be $1$ if vertex $v$ belongs to group $r$ and $0$ otherwise. Then

\delta (c_{v},c_{w})=\sum _{r}S_{vr}S_{wr}

and hence

Q={\frac {1}{2m}}\sum _{vw}\sum _{r}\left[A_{vw}-{\frac {k_{v}k_{w}}{2m}}\right]S_{vr}S_{wr}={\frac {1}{2m}}\mathrm {Tr} (\mathbf {S} ^{\mathrm {T} }\mathbf {BS} ),

where $S$ is the (non-square) matrix having elements $S_{v}$ and $B$ is the so-called modularity matrix, which has elements

B_{vw}=A_{vw}-{\frac {k_{v}k_{w}}{2m}}.

All rows and columns of the modularity matrix sum to zero, which means that the modularity of an undivided network is also always $0$ .

For networks divided into just two communities, one can alternatively define $s_{v}=\pm 1$ to indicate the community to which node $v$ belongs, which then leads to

Q={1 \over 4m}\sum _{vw}B_{vw}s_{v}s_{w}={1 \over 4m}\mathbf {s} ^{\mathrm {T} }\mathbf {Bs} ,

where $s$ is the column vector with elements $s_{v}$ .^[1]

This function has the same form as the Hamiltonian of an Ising spin glass, a connection that has been exploited to create simple computer algorithms, for instance using simulated annealing, to maximize the modularity. The general form of the modularity for arbitrary numbers of communities is equivalent to a Potts spin glass and similar algorithms can be developed for this case also.^[7]

Share this article:

This article uses material from the Wikipedia article Modularity_(networks), and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[npnas-1] [1]
Newman, M. E. J. (2006). "Modularity and community structure in networks". Proceedings of the National Academy of Sciences of the United States of America. 103 (23): 8577–8696. arXiv:physics/0602124. Bibcode:2006PNAS..103.8577N. doi:10.1073/pnas.0601602103. PMC 1482622. PMID 16723398.

[mathnet-2] [2]
Newman, M. E. J. (2007). Palgrave Macmillan, Basingstoke (ed.). "Mathematics of networks". The New Palgrave Encyclopedia of Economics (2 ed.).

[3] [3]
Brandes, U.; Delling, D.; Gaertler, M.; Gorke, R.; Hoefer, M.; Nikoloski, Z.; Wagner, D. (February 2008). "On Modularity Clustering". IEEE Transactions on Knowledge and Data Engineering. 20 (2): 172–188. doi:10.1109/TKDE.2007.190689. S2CID 150684.

[config-4] [4]
van der Hofstad, Remco (2013). "Chapter 7" (PDF). Random Graphs and Complex Networks. Archived (PDF) from the original on 2013-12-18. Retrieved 2013-12-08.

[5] [5]
"NetworkScience". Albert-László Barabási. Archived from the original on 2020-03-05. Retrieved 2020-03-20.

[community-6] [6]
Clauset, Aaron and Newman, M. E. J. and Moore, Cristopher (2004). "Finding community structure in very large networks". Phys. Rev. E. 70 (6): 066111. arXiv:cond-mat/0408187. Bibcode:2004PhRvE..70f6111C. doi:10.1103/PhysRevE.70.066111. PMID 15697438. S2CID 8977721.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[rb06-7] [7]
Joerg Reichardt & Stefan Bornholdt (2006). "Statistical mechanics of community detection". Physical Review E. 74 (1): 016110. arXiv:cond-mat/0603718. Bibcode:2006PhRvE..74a6110R. doi:10.1103/PhysRevE.74.016110. PMID 16907154. S2CID 792965.

[8] [8]
Peixoto, Tiago P. (2021). "Descriptive vs. inferential community detection: pitfalls, myths and half-truths". arXiv:2112.00183.

[9] [9]
Guimera, Roger; Sales-Pardo, Marta (August 19, 2004), "Modularity from fluctuations in random graphs and complex networks", Physical Review, 70 (2): 025101, arXiv:cond-mat/0403660, Bibcode:2004PhRvE..70b5101G, doi:10.1103/PhysRevE.70.025101, PMC 2441765, PMID 15447530

[10] [10]
Santo Fortunato & Marc Barthelemy (2007). "Resolution limit in community detection". Proceedings of the National Academy of Sciences of the United States of America. 104 (1): 36–41. arXiv:physics/0607100. Bibcode:2007PNAS..104...36F. doi:10.1073/pnas.0605965104. PMC 1765466. PMID 17190818.

[11] [11]
J.M. Kumpula; J. Saramäki; K. Kaski & J. Kertész (2007). "Limited resolution in complex network community detection with Potts model approach". European Physical Journal B. 56 (1): 41–45. arXiv:cond-mat/0610370. Bibcode:2007EPJB...56...41K. doi:10.1140/epjb/e2007-00088-4. S2CID 4411525.

[12] [12]
Alex Arenas, Alberto Fernández and Sergio Gómez (2008). "Analysis of the structure of complex networks at different resolution levels". New Journal of Physics. 10 (5): 053039. arXiv:physics/0703218. Bibcode:2008NJPh...10e3039A. doi:10.1088/1367-2630/10/5/053039. S2CID 11544197.

[13] [13]
Andrea Lancichinetti & Santo Fortunato (2011). "Limits of modularity maximization in community detection". Physical Review E. 84 (6): 066122. arXiv:1107.1155. Bibcode:2011PhRvE..84f6122L. doi:10.1103/PhysRevE.84.066122. PMID 22304170. S2CID 16180375.

[louvainmethod-14] [14]
First implementation of Louvain algorithm, archived from the original on 2021-03-17, retrieved 2020-11-30

[leidenalgorithm-15] [15]
Leiden algorithm repository, 15 December 2021, archived from the original on 26 November 2020, retrieved 30 November 2020

[vieclus-16] [16]
Vienna graph clustering repository, 13 April 2021, archived from the original on 21 October 2020, retrieved 30 November 2020

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Modularity_(networks)

Modularity (networks)

Motivation

Definition

Expected Number of Edges Between Nodes

Modularity

Example of multiple community detection

Matrix formulation

Overfitting

Resolution limit

Multiresolution methods

Software Tools

See also

References

Share this article:

Node ID	1	2	3	4	5	6	7	8	9	10
1	0	1	1	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0	0	0
3	1	1	0	0	0	0	0	0	0	0
4	0	0	0	0	1	1	0	0	0	1
5	0	0	0	1	0	1	0	0	0	0
6	0	0	0	1	1	0	0	0	0	0
7	0	0	0	0	0	0	0	1	1	1
8	0	0	0	0	0	0	1	0	1	0
9	0	0	0	0	0	0	1	1	0	0
10	1	0	0	1	0	0	1	0	0	0

Node ID	1	2	3	4	5	6	7	8	9	10
1	0	1	1	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0	0	0
3	1	1	0	0	0	0	0	0	0	0
4	0	0	0	0	1	1	0	0	0	1
5	0	0	0	1	0	1	0	0	0	0
6	0	0	0	1	1	0	0	0	0	0
7	0	0	0	0	0	0	0	1	1	1
8	0	0	0	0	0	0	1	0	1	0
9	0	0	0	0	0	0	1	1	0	0
10	1	0	0	1	0	0	1	0	0	0

Node ID	1	2	3	4	5	6	7	8	9	10
1	0	1	1	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0	0	0
3	1	1	0	0	0	0	0	0	0	0
4	0	0	0	0	1	1	0	0	0	1
5	0	0	0	1	0	1	0	0	0	0
6	0	0	0	1	1	0	0	0	0	0
7	0	0	0	0	0	0	0	1	1	1
8	0	0	0	0	0	0	1	0	1	0
9	0	0	0	0	0	0	1	1	0	0
10	1	0	0	1	0	0	1	0	0	0