How to get reproducible result in Amazon SageMaker with TensorFlow Estimator?

692 Views Asked by At

I am currently using AWS SageMaker Python SDK to train EfficientNet model (https://github.com/qubvel/efficientnet) to my data. Specifically, I use TensorFlow estimator as below. This code is in SageMaker notebook instance

import sagemaker
from sagemaker.tensorflow.estimator import TensorFlow
### sagemaker version = 1.50.17, python version = 3.6

estimator = TensorFlow("train.py", py_version = "py3", framework_version = "2.1.0",
                       role = sagemaker.get_execution_role(), 
                       train_instance_type = "ml.m5.xlarge", 
                       train_instance_count = 1,
                       image_name = 'xxx.dkr.ecr.xxx.amazonaws.com/xxx',
                       hyperparameters = {list of hyperparameters here: epochs, batch size},
                       subnets = [xxx], 
                       security_group_ids = [xxx]
estimator.fit({
   'class_1': 's3_path_class_1',
   'class_2': 's3_path_class_2'
})

The code for train.py contains the usual training procedure, getting the image and labels from S3, transform them into the right array shape for EfficientNet input, and split into train, validation, and test set. In order to get reproducible result, I use the following reset_random_seeds function (If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?) before calling EfficientNet model itself.

### code of train.py

import os
os.environ['PYTHONHASHSEED']=str(1)
import numpy as np
import tensorflow as tf
import efficientnet.tfkeras as efn
import random

### tensorflow version = 2.1.0
### tf.keras version = 2.2.4-tf
### efficientnet version = 1.1.0

def reset_random_seeds():
   os.environ['PYTHONHASHSEED']=str(1)
   tf.random.set_seed(1)
   np.random.seed(1)
   random.seed(1)

if __name__ == "__main__":

   ### code for getting training data
   ### ... (I have made sure that the training input is the same every time i re-run the code)
   ### end of code

   reset_random_seeds()
   model = efn.EfficientNetB5(include_top = False, 
      weights = 'imagenet', 
      input_shape = (80, 80, 3),
      pooling = 'avg',
      classes = 3)
   model.compile(optimizer = 'Adam', loss = 'categorical_crossentropy')
   model.fit(X_train, Y_train, batch_size = 64, epochs = 30, shuffle = True, verbose = 2)

   ### Prediction section here

However, each time i run the notebook instance, i always get a different result from the previous run. When I switched train_instance_type to "local" i always get the same result each time i run the notebook. Therefore, is the non-reproducible result caused by the training instance type that I have chosen? this instance (ml.m5.xlarge) has 4 vCPU, 16 Mem (GiB), and no GPU. If so, how to obtain reproducible results under this training instance?

1

There are 1 best solutions below

0
On

Is it possible that your inconsistent result is getting from the

tf.random.set_seed()

Came across a post here: Tensorflow: Different results with the same random seed