Calculation of Shortest Paths in a Directed Graph takes much longer than calculating Betweenness Centrality

295 Views Asked by At

First of all, I am trying to calculate the Betweenness Centrality of a fully connected directional graph with edge weights with N=3015 nodes. Matlab can do this in about 30 seconds and Python igraph can do it in a minute while Networkx takes hours as it is written in python and so I have abandoned it.

Second, I need to get a list of all of the shortest paths (those that minimize weights along path) between all pairs of nodes. In matlab there is a function to do this called shortest path. There is also a similar function in igraph called get_shortest_paths. I can only get one path back from each call.

So I loop over all $N^2$ pairs of nodes and extract the shortest path, store it in an array and then continue.

The problem is that while I can get betweenness centrality in under a minute, extracting all of the shortest paths takes hours in both matlab and igraph.

This seems strange given that betweenness centrality is based on the calculation of all of the shortest paths. So why does the extraction of the shortest paths take so much longer ?

2

There are 2 best solutions below

0
Vincent Traag On

You probably want to use the function get_all_shortest_paths instead of get_shortest_path. Indeed, get_shortest_path returns only one shortest path, but get_all_shortest_paths will return all shortest paths.

0
alle_meije On

The computation of all shortest paths as the independent application of the single-shortest-path algorithm for all points is not very efficient.

You can imagine that if the shortest paths from points A and B to another point C largely overlap, there must be a way to reuse the knowledge of building the path from A to C for quickly finding the path of B to C. In essence, that is the difference between algorithms such as Dijkstra's algorithm (for single-source shortest paths) and others like the Floyd-Warshall algorithm (for all shortest paths) in igraph, in networkx and in matlab I think it's distances (which uses Johnson's algorithm, designed for sparse graphs - rather than the dense graphs you describe).