Shortest path with 10,0000 nodes, 5 billion edges.
Suppose matrix $A$ is the symmetric weighted ajacency matrix (Undirected Graph), and element $A_{ij}$ represents the weight of edge $e_{ij}$, suppose the matrix size is $100000\times 100000$, so here are $n(n-1)/2=4999950000$ edges, essentially 5 billion edges, I wish to obtain two matrixes $L$ and $W$:
Let $P_{ij}=\left{v_1, v_2, \cdots, v_n\right}$ represents one of the shortest path with $n$ nodes from $i$ to $j$ (or $j$ to $i$ due to symmetric), hence we can calculate the following two quanties [L_{ij} = n-1, W_{ij} = \sum_{i=1}^{n-1}A_{v_i,v_{i+1}}]
(1) The Matrix $L$ of weighted shortest path length, whiere $L_{ij}$ represents the length of shortest from node $i$ to node $j$.
(2) The matrix $W$ of shortest path cumulative edge weights, where $W_{ij}$ represents the cumulative edge weight along the path.
Here I got that for all pair shortest path, there are algorithms like Floyd Warshall, Johnson algorithms, but it would require hundreds of days to calculate all pairs path length, what is the possible suitable solution for this?
I tried Python igraph with python code
import tqdm
import numpy as np
import pandas as pd
import igraph as ig
def all_pairs_shortest_length(ajacency_matrix):
m, n = ajacency_matrix.shape # ajacency_matrix is weighted
L = np.zeros((m,n), dtype=int) # L is symmetric
G = ig.Graph.Weighted_Adjacency(ajacency_matrix, mode = 'undirected') # 340G memory to construct graph with 65000 nodes
for v in tqdm.trange(m):
# v as start node, only calculate target nodes from v+1 to m
paths_rest = ig.GraphBase.get_all_shortest_paths(G, v=v, to = range(v+1, m), weights=G.es["weight"], mode="out")
target_with_length = pd.DataFrame([[path[-1],len(path)-1] for path in paths_rest], columns=['target','length']).drop_duplicates(ignore_index=True)
# for weighted shortest path, choose the path with minimum number of edges
L[v,v+1:] = target_with_length.groupby('target', group_keys=True).apply(min).length
L = L + L.T
M = np.array(ig.GraphBase.distances(G, weights=G.es["weight"]))
return L, M
if __name__=='__main__':
test_N = 100000
b = np.random.random_integers(1,100, size=(test_N, test_N))
ajacency_matrix = (b + b.T)/2
np.fill_diagonal(ajacency_matrix, 0)
L, M = all_pairs_shortest_length(ajacency_matrix)
The Dijkstra algorithm is the fastest way to find the shortest paths between nodes. One run of the algorithm finds the shortest paths from one node to every other node. To find the shortest path between every pair of nodes will require many runs on a directed graph. An undirected graph will require fewer runs since the shortest path between u and v is just the reverse of the path between v and u. You do not tell us if your graph is directed or not.
In any case, this does take a long time for large graphs, though if your implementation of Dijkstra is well coded you will need minutes rather than days.
Getting reasonable performance from code written in Python will be a challenge. If you care about performance, you should consider a compiled coding language such as C++ which can give up to 50 times faster run times than the same algorithm coded in Python.
Here is some C++ code implementing the Dijkstra algorithm.
You can find the code for the complete path finding application at https://github.com/JamesBremner/PathFinder
Run time for running the Dijkstra algorithm to find the cheapest paths from one randomly selected vertex to every other vertex on large graphs downloaded from https://dyngraphlab.github.io/. Longest result from 3 runs using the graphex application.
Here is an introduction the Dijkstra algorithm https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm