Networkx maximal_independent_set reproducibility

189 Views Asked by At

How can I get reproducible results in a Jupyter Notebook (Python3)?

Defining a seed for the main random generators seems to be not enough, see MWE below:

import numpy as np
import random
import os 

random.seed(0)
np.random.seed(0)
os.environ['PYTHONHASHSEED']=str(0)
import networkx
from networkx.algorithms.mis import maximal_independent_set

G = networkx.Graph()
G.add_edges_from([ ('A', 'B'), ('B', 'C'), ('C', 'D'), ('D','E'), ('E','F'), ('A','E'), ('B','E') ])

for i in range(0,10):
   print( maximal_independent_set(G, seed=0) )

Gives the same result in each run in the loop.

However, when restarting the kernel and running the cells again, results change to another subset.

1

There are 1 best solutions below

1
On BEST ANSWER

The short answer for your question is: In the current implementation it is not reproducible - if taking restarts of kernel into account.

Long Answer

You need to use OrderedGraph and re-implement the function maximal_independent_set using an OrderedSet (see Does Python have an ordered set?) to may ensure the reproducibility you are looking for. Alternatively to OrderedSets, you can ensure the ordering for the casts from the sets to the lists by sorting, see here.

Debugging - based on the implementation

What is the cause of the problem. First observation, it's not the random generator. That is correctly instantiated and has the same state over multiple runs (At the beginning and after the end - check with seed.getstate(). The problematic calls within the maximal_independent_set are all set operations. For your mwe, for example:

neighbors = set.union(*[set(G.adj[v]) for v in nodes])
# next line added for simple debugging:
print(nodes, neighbors, G.adj["D"])
# Output 1: ['D', 'A', 'F']
# {'D'} {'E', 'C'} OrderedDict([('C', OrderedDict()), ('E', OrderedDict())])
# ['D', 'F', 'B']
# {'D'} {'C', 'E'} OrderedDict([('C', OrderedDict()), ('E', OrderedDict())])

As you see, multiple runs result into different sets (with respect to order). The same holds for the other seed, especially, available_nodes. Hence, fixing the selected list id by fixing the random number generator does not ensure reproducibility, since the sorting of the elements within the list can be different.