Can I perform Keras training in a deterministic manner?

1.1k Views Asked by At

I'm using a Keras Sequential model where the inputs and labels are exactly the same each run. Keras is using a Tensorflow backend.

I've set the layer activations to 'zeros' and disabled batch shuffling during training.

model = Sequential()
model.add(Dense(128, 
                activation='relu', 
                kernel_initializer='zeros', 
                bias_initializer='zeros'))
...

model.compile(optimizer='rmsprop', loss='binary_crossentropy') 

model.fit(x_train, y_train, 
          batch_size = 128, verbose = 1, epochs = 200, 
          validation_data=(x_validation, y_validation),
          shuffle=False)

I've also tried seeding Numpy's random() method:

np.random.seed(7) # fix random seed for reproducibility

With the above in place I still receive different accuracy and loss values after training.

Am I missing something or is there no way to fully remove the variance between trainings?

2

There are 2 best solutions below

1
On

Since this seems to be a real issue, as commented before, maybe you could go for manually initializing your weights (instead of trusting the 'zeros' parameter passed in the layer constructor):

#where you see layers[0], it's possible that the correct layer is layers[1] - I can't test at this moment. 

weights = model.layers[0].get_weights()
ws = np.zeros(weights[0].shape)
bs = np.zeros(weights[1].shape)
model.layers[0].set_weights([ws,bs])
0
On

It seems the problem occurs in training and not initialization. You can check this by first initializing two models model1 and model2 and running the following code:

  w1 = model1.get_weights()
  w2 = model2.get_weights()

  for i in range(len(w1)):
      w1i = w1[i]
      w2i = w2[i]
      assert np.allclose(w1i, w2i), (w1i, w2i)
      print("Weight %i were equal. "%i)

  print("All initial weights were equal. ")

Even though all assertions passed, training model1 and model2 with shuffle=False yielded different models. That is, if I perform similar assertions on the weights of model1 and model2 after training the assertions all fail. This suggests that the problem lies in randomness from training.

As of this post I have not managed to figure out how to circumvent this.