I am currently using AWS SageMaker Python SDK to train EfficientNet model (https://github.com/qubvel/efficientnet) to my data. Specifically, I use TensorFlow estimator as below. This code is in SageMaker notebook instance
import sagemaker
from sagemaker.tensorflow.estimator import TensorFlow
### sagemaker version = 1.50.17, python version = 3.6
estimator = TensorFlow("train.py", py_version = "py3", framework_version = "2.1.0",
role = sagemaker.get_execution_role(),
train_instance_type = "ml.m5.xlarge",
train_instance_count = 1,
image_name = 'xxx.dkr.ecr.xxx.amazonaws.com/xxx',
hyperparameters = {list of hyperparameters here: epochs, batch size},
subnets = [xxx],
security_group_ids = [xxx]
estimator.fit({
'class_1': 's3_path_class_1',
'class_2': 's3_path_class_2'
})
The code for train.py contains the usual training procedure, getting the image and labels from S3, transform them into the right array shape for EfficientNet input, and split into train, validation, and test set. In order to get reproducible result, I use the following reset_random_seeds function (If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?) before calling EfficientNet model itself.
### code of train.py
import os
os.environ['PYTHONHASHSEED']=str(1)
import numpy as np
import tensorflow as tf
import efficientnet.tfkeras as efn
import random
### tensorflow version = 2.1.0
### tf.keras version = 2.2.4-tf
### efficientnet version = 1.1.0
def reset_random_seeds():
os.environ['PYTHONHASHSEED']=str(1)
tf.random.set_seed(1)
np.random.seed(1)
random.seed(1)
if __name__ == "__main__":
### code for getting training data
### ... (I have made sure that the training input is the same every time i re-run the code)
### end of code
reset_random_seeds()
model = efn.EfficientNetB5(include_top = False,
weights = 'imagenet',
input_shape = (80, 80, 3),
pooling = 'avg',
classes = 3)
model.compile(optimizer = 'Adam', loss = 'categorical_crossentropy')
model.fit(X_train, Y_train, batch_size = 64, epochs = 30, shuffle = True, verbose = 2)
### Prediction section here
However, each time i run the notebook instance, i always get a different result from the previous run. When I switched train_instance_type to "local" i always get the same result each time i run the notebook. Therefore, is the non-reproducible result caused by the training instance type that I have chosen? this instance (ml.m5.xlarge) has 4 vCPU, 16 Mem (GiB), and no GPU. If so, how to obtain reproducible results under this training instance?
Is it possible that your inconsistent result is getting from the
Came across a post here: Tensorflow: Different results with the same random seed