how to find clusters with a network based on density and weight of edges in python - networkx package

4.3k Views Asked by At

I constructed a network using the python package - networkx, each edge has a weight which indicates how close the two nodes are, in terms of correlation.

It would be ideal if there is a built in algorithm that would return a clustered graph, assigning each node to it's cluster ID (1 to k).

It would be even better if it could cluster based on the weight of the edges, but not critical...

Any idea how this could be done?

2

There are 2 best solutions below

5
On

You might want to look into the package python-louvain. With it, you can detect communities in a graph using the function best_partition. From the function description:

Compute the partition of the graph nodes which maximises the modularity (or try..) using the Louvain heuristices

This is the partition of highest modularity, i.e. the highest partition of the dendrogram generated by the Louvain algorithm.

In my example I compute the communities for the karate_club_graph. (Note that I use best_partition with the weight keyword even though my graph doesn't have weighted edges -- I'm just showing how would you use the function in your case.)

import networkx as nx
import community

G = nx.karate_club_graph()
p = community.best_partition(G, weight='weight')
print(p)

Output:

{0: 0, 1: 0, 2: 0, 3: 0, 4: 1, 5: 1, 6: 1, 7: 0, 8: 2, 9: 0, 10: 1, 11: 0, 12: 0, 13: 0, 14: 2, 15: 2, 16: 1, 17: 0, 18: 2, 19: 0, 20: 2, 21: 0, 22: 2, 23: 3, 24: 3, 25: 3, 26: 2, 27: 3, 28: 3, 29: 2, 30: 2, 31: 3, 32: 2, 33: 2}

The output is a dictionary (key = node, value = partition). The partitions go from 0 to k-1. If you need them to go from 1 to k, you can just bump the dictionary values to +1.

for k, v in p.items():
    p[k] = v + 1
0
On

this can help networkx.algorithms.community.label_propagation.asyn_lpa_communities

as per doc.

Returns communities in G as detected by asynchronous label propagation. The weight of an edge is used in determining the frequency with which a label appears among the neighbors of a node.