How can I get reproducible results in a Jupyter Notebook (Python3)?
Defining a seed for the main random generators seems to be not enough, see MWE below:
import numpy as np
import random
import os
random.seed(0)
np.random.seed(0)
os.environ['PYTHONHASHSEED']=str(0)
import networkx
from networkx.algorithms.mis import maximal_independent_set
G = networkx.Graph()
G.add_edges_from([ ('A', 'B'), ('B', 'C'), ('C', 'D'), ('D','E'), ('E','F'), ('A','E'), ('B','E') ])
for i in range(0,10):
print( maximal_independent_set(G, seed=0) )
Gives the same result in each run in the loop.
However, when restarting the kernel and running the cells again, results change to another subset.
The short answer for your question is: In the current implementation it is not reproducible - if taking restarts of kernel into account.
Long Answer
You need to use
OrderedGraph
and re-implement the functionmaximal_independent_set
using an OrderedSet (see Does Python have an ordered set?) to may ensure the reproducibility you are looking for. Alternatively to OrderedSets, you can ensure the ordering for the casts from the sets to the lists by sorting, see here.Debugging - based on the implementation
What is the cause of the problem. First observation, it's not the random generator. That is correctly instantiated and has the same state over multiple runs (At the beginning and after the end - check with
seed.getstate()
. The problematic calls within themaximal_independent_set
are allset
operations. For your mwe, for example:As you see, multiple runs result into different sets (with respect to order). The same holds for the other seed, especially,
available_nodes
. Hence, fixing the selected list id by fixing the random number generator does not ensure reproducibility, since the sorting of the elements within the list can be different.