Tensorflow learning stucks at a random step and produces a lot of warnings

75 Views Asked by At

I set

tf.config.experimental.set_device_policy('warn')

and when I call fit() function the learning process might get stuck on a random step and epoch while producing these warnings each step:

W tensorflow/core/common_runtime/eager/execute.cc:169] before computing Shape input #0 was expected to be on /job:localhost/replica:0/task:0/device:GPU:0 but is actually on /job:localhost/replica:0/task:0/device:CPU:0 (operation running on /job:localhost/replica:0/task:0/device:GPU:0). This triggers a copy which can be a performance bottleneck.

tf.compat.v1.disable_eager_execution()

tends to fix the problem but reduces some functionality that I need. Is there a way to fix it without disabling eager execution?

EDIT 1 It seems that disabling eager execution doesn't help

EDIT 2

import tensorflow as tf
from tensorflow import keras
from keras import layers, callbacks

path = 'Dogs/'

train_data = keras.utils.image_dataset_from_directory(path, subset= 'training', validation_split= 0.2, seed= 478, label_mode= 'categorical', batch_size= 32)
test_data = keras.utils.image_dataset_from_directory(path, subset= 'validation', validation_split= 0.2, seed= 478, label_mode= 'categorical', batch_size= 32)

def standartize_img(image, label):
   image = image / 255.0
   return image, label

AUTOTUNE = tf.data.AUTOTUNE
train_data = 
train_data.map(standartize_img).cache().prefetch(AUTOTUNE)
test_data = test_data.map(standartize_img).cache().prefetch(AUTOTUNE)

model = keras.Sequential([
layers.Resizing(128,128, interpolation= 'nearest'),    
layers.Conv2D(filters= 32, activation= 'relu', padding= 'same', strides= 1, kernel_size= 3),
layers.MaxPooling2D(),

layers.Conv2D(filters= 64, activation= 'relu', padding= 'same', strides= 1, kernel_size= 3),
layers.Conv2D(filters= 64, activation= 'relu', padding= 'same', strides= 1, kernel_size= 3),
layers.MaxPooling2D(),

layers.Conv2D(filters= 128, activation= 'relu', padding= 'same', strides= 1, kernel_size= 3),
layers.Conv2D(filters= 128, activation= 'relu', padding= 'same', strides= 1, kernel_size= 3),
layers.MaxPooling2D(),

layers.Flatten(),
layers.Dense(32, activation= 'relu', kernel_regularizer= keras.regularizers.L2(0.001)),
layers.Dropout(0.4),
layers.Dense(10, activation= 'softmax')
])

model.compile(optimizer= 'adam', loss= keras.losses.CategoricalCrossentropy(), metrics= ['categorical_crossentropy','accuracy'])

early_stop = callbacks.EarlyStopping(min_delta= 0.001, patience= 15, restore_best_weights= True, monitor= 'val_categorical_crossentropy')

history= model.fit(train_data, validation_data= test_data, epochs= 300, callbacks= [early_stop])

EDIT 3 Logs- https://github.com/Delinester/logs

EDIT 4 - PC Specs:

Windows 10

Intel Core i3-8100

AMD Radeon RX580 (Running with Tensorflow DirectML plugin)

16 GB of RAM

0

There are 0 best solutions below