Graphical modularity in python networkx

I have created a graph in python lib NetorwkX and I want to implement a modularity algorithm to group the nodes of my graph. I came across the following code:

import community
import matplotlib.pyplot as plt
import networkx as nx

G = nx.Graph()

G = nx.read_weighted_edgelist('graphs/fashionGraph_1.edgelist')
nx.transitivity(G)

# Find modularity
part = community.best_partition(G)
mod = community.modularity(part,G)

# Plot, color nodes using community structure
values = [part.get(node) for node in G.nodes()]
nx.draw_spring(G, cmap=plt.get_cmap('jet'), node_color = values, node_size=30, with_labels=False)
plt.show()

      

My graph has 4267 and 3692 edges. The result will be the following:

enter image description here

I am a bit confused about how the graph nodes are grouped. What exactly is the logic of colors?

+3


source to share


2 answers


From the documentation :

Colour

Node. Can be a single color format string or a sequence of colors with the same length as the nodelist. If numeric values โ€‹โ€‹are specified, they will be rendered in colors using the cmap and vmin, vmax parameters. See Matplotlib.scatter for details.

part = community.best_partition(G)

assigns a community to each node is part

is a dict and part[node]

is the community that the node belongs to (each is assigned an integer). Later values = [part.get(node) for node in G.nodes()]

creates a list with the community number for each node in the order the nodes appear in G.nodes()

.



Then, in the plotting command, it will use these community numbers to determine the color of the nodes. All nodes that have been assigned to the same community will have the same color.

The physical locations of the nodes are assigned by the spring layout. You can see that the spring layout seems to be putting nodes in positions that some communities suggest, which are different from what it finds community.best_partition

. This is perhaps a bit surprising, but of course, nothing stands in the way. It seems to me that the algorithm you used does not take into account the whole network structure. the documentation for best_partition

gives some explanation of the basic algorithm.

+3


source


Roughly speaking, nodes are grouped into communities, so that the ratio of connections within a community to intercommunity connections (a measure of modularity) is optimized.

The exact definition of modularity from wikipedia :

Modularity is the fraction of edges that fall into a given group minus the expected fraction if the edges are randomly distributed. The modularity value is in the range [-1 / 2.1]. This is positive if the number of edges within the groups exceeds the expected number based on randomness. For a given division of network vertices into some modules, modularity reflects the concentration of edges within modules compared to the random distribution of connections between all nodes, regardless of the modules.

The algorithm, implemented by the community package, finds an approximate solution (split into communities) using an iterative process that first defines each node as a community and continues to merge them until modularity is optimized.



More precise information can be found in the document describing the algorithm:

Rapid deployment of communities across large networks. VD Blondel, J.L. Guillaume, R. Lambiote, E. Lefebvre Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P10008

(I managed to restore and install it on windows from https://pypi.python.org/pypi/python-louvain )

+1


source







All Articles