I am currently trying to compute shortest paths between every pair of connected nodes in a very large weighted graph (160K nodes, 15 million edges).
I would like to find the number of edges in the shortest path between every two nodes, when all edges with a weight >= the weight of the edge between the two nodes are deleted. If a path cannot be found, I will take the shortest path as 1 (i.e. the deleted edge connecting the nodes).
Is it possible to use igraph v0.9.11 to do this (essentially GraphBase.shortest_paths(), with the option to ignore edges with a weight above a given threshold)? I would rather not hand code a solution given the size of the graph (I'm guessing the igraph implementation would be better and hopefully written in a faster non-python language). Ideally, it would look something like this:
g = ig.Graph.TupleList(data_all.itertuples(index=False),
directed=False,
edge_attrs = ['percent', 'length_difference']
)
for edge in g.es:
threshold_weight = edge.attributes()['length_difference']
shortest = g.my_pretend_shortest_paths(source = edge.source, target = edge.target, weights = 'length_difference', mode = 'all', ignore_weights_over = threshold_weight) #should not alter original graph obj
if shortest = []:
shortest_paths += [(edge.source, edge.target, threshold_weight, 1)]
else:
shortest_paths += [(edge.source, edge.target, threshold_weight, len(shortest))]
My current solution is below, but I'm hoping there's a better way than copying a big graph and doing costly graph operations outside of shortest_paths, 15 million times. This takes 6-700 seconds to analyse 500 edges within a test graph with only ~250K edges (giving a predicted runtime of ~230 days for 15 million edges):
g = ig.Graph.TupleList(df.itertuples(index=False),
directed=False,
edge_attrs = ['percent', 'length_difference']
)
shortest_paths = []
for edge in g.es:
temp_g = copy.deepcopy(g)
threshold_weight = edge.attributes()['length_difference']
#delete edges in the graph copy with a length_difference edge attribute >= current edge's.
#https://stackoverflow.com/a/28951383/11357695
temp_g.es.select(length_difference_ge = threshold_weight).delete()
shortest = temp_g.shortest_paths(source = edge.source, target = edge.target, weights = 'length_difference', mode = 'all')
if shortest = []:
shortest_paths += [(edge.source, edge.target, threshold_weight, 1)] #will plot these downstream
else:
shortest_paths += [(edge.source, edge.target, threshold_weight, len(shortest))] #will plot these downstream
Ideally, this will complete within about 12 hours max - I can leave it overnight but beyond that it gets a bit awkward.
Thanks, Tim
EDIT
I have optimised it further to remove the copying of the large graph and removing redundant processing, but it still takes about a minute to process 1-2000 nodes (i.e. ~1 week for 15 million nodes assuming everything scales linearly). Any better solutions would be appreciated!
def ec_list_to_g(es):
edges_formatted = []
for e in es:
att = e.attributes()
edges_formatted += [(e.source, e.target, att['percent'], att['length_difference'])]
return ig.Graph.TupleList(edges_formatted,
directed=False,
edge_attrs = ['percent', 'length_difference']
)
g = ig.Graph.TupleList(df.itertuples(index=False),
directed=False,
edge_attrs = ['percent', 'length_difference']
)
shortest_paths = []
for edge in g.es:
threshold_weight = edge.attributes()['length_difference']
if threshold_weight <= 1:
continue
target_es = g.es.select(length_difference_lt = threshold_weight)
sub_g = ec_list_to_g(target_es)
shortest = sub_g.shortest_paths(source = edge.source, target =
edge.target, weights = 'length_difference', mode = 'all')#follow on as before
As of igraph 0.10 (and all earlier versions), graph modifications such as deleting edges is expensive. It causes the graph to be re-built from scratch, thus it takes time proportional to the graph size, even if you remove only a single edge.
When you perform weighted shortest path calculations, an efficient way to effectively disable an edge without actually removing it is to set its weight to infinity (
float('inf')). Using this technique is likely to significantly improve performance.