Improve performance on Networkx graphviz_layout for large volume of nodes and edges

47 Views Asked by At

I have a network graph dataset which has around 12.5k root nodes and 70k edges which obviously would end up creating a huge graph. However, the end user would not be consuming the graph in its entirety but would be filtering on certain root nodes to see the network chart accordingly. The network is basically a lineage for objects hence the different levels and preference to use the "dot" representation since it shows more of top to bottom hierarchical representation in an org format.

I am using the below code to create the position mapping using networkx, graphviz_layout, dot program. However, the program crashes the python kernel due to memory issues.

from networkx.drawing.nx_agraph import graphviz_layout

# Visualize the subgraph
pos = graphviz_layout(subgraph, prog='dot')  # You can use different layout algorithms

I also tried processing each node in a loop and joblib parallel for faster processing and using 3 cores (I have 4 CPU cores with 16 GB RAM on Windows OS)

from joblib import Parallel, delayed

# Define a function to calculate layout for a single node
def calculate_layout(node, subgraph):
    return node, graphviz_layout(subgraph.subgraph([node]), prog='dot')

max_workers = 3
results = Parallel(n_jobs=max_workers, verbose=3)(delayed(calculate_layout)(node, subgraph) for node in tqdm(subgraph.nodes(), total=len(subgraph.nodes())))

pos = dict(results)

Here too the process terminates due to memory usage.

[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  26 tasks      | elapsed:   14.2s
[Parallel(n_jobs=3)]: Done 122 tasks      | elapsed:  1.1min
[Parallel(n_jobs=3)]: Done 282 tasks      | elapsed:  2.6min
[Parallel(n_jobs=3)]: Done 506 tasks      | elapsed:  4.7min
[Parallel(n_jobs=3)]: Done 794 tasks      | elapsed:  7.8min
[Parallel(n_jobs=3)]: Done 1146 tasks      | elapsed: 12.4min

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

Is there any better or more efficient way to accomplish this?

0

There are 0 best solutions below