DIfferent optimization with different TF versions

89 Views Asked by At

I'm trying to train a convolutional neural network with keras and Tensorflow version 2.6, also I did it with Tensorflow version 1.11. I think that I did the migration okey (two neural networks converged) but when I see the results they are very different, worst in TF2.6, I used an optimizer Adam for both cases with the same hyperparameters (learning_rate = 0.001) but the optimization in the loss function in TF1.11 is better than in TF2.6

I'm trying to find out where the differences could be. What things must be taken into account when we work with differents TF versions? Can have important numerical differences? I know that in TF1.x the default mode is graph and in TF2 the default is eager, I don't know if this could bring different behavior in the training.

It surprises me how much the loss function is reduced in the first epochs reaching a lower value at the end of the training.

enter image description here

1

There are 1 best solutions below

2
Jirayu Kaewprateep On

you understand that is correct they are working in different working modes eager and graph but the loss Fn is defined by how much change of value to required optimized pointed calculated by your or configured method.

  1. You cannot directly be compared one model training history to another directly, running it several time you experience TF 1 is faster and smaller in the number of losses in the loss Fn that is needed to review the changelog Changlog
  2. Loss Fn are updated, the graph is the powerful technique we know but TF 2.x supports access of the value at its level, why you have easy delegated methods such as callback, dynamic FNs, and working update value runtime. ( Trends to understand and experiments for student or user compared by both versions on the same tasks )

Symetrics in methods not create different results.

Sample