I'm trying to make a denoise autoencoder wherein the encoder part is vgg16 and decoder is opposite of vgg16(encoder) network. My dataset consists of 5K images in grayscale and these are the steps i've followed to prepare and normalize:
input_X = [], (list having 5K (noisy)images of dims 224,224,1)
input_Y = [], (list having 5K (ground truth)images of dims 224,224,1)
test_X = [] (list having 1K (noisy)images for testing of dims 224,224,1)
input_X = input_X/255.0
input_Y = input_Y/255.0
test_X = test_X/255.0
print(input_X.shape)
print(input_Y.shape)
print(test_X.shape)
input_X = np.repeat(input_X[..., np.newaxis], 3, -1) #creating 3 channels
input_Y = np.repeat(input_Y[..., np.newaxis], 3, -1)
test_X = np.repeat(test_X[..., np.newaxis], 3, -1)
print(input_X.shape) (5000,224,224,3)
print(input_Y.shape)
print(test_X.shape)
encoder network:
vggmodel = keras.applications.vgg16.VGG16()
model_encoder = Sequential()
num = 0
for i, layer in enumerate(vggmodel.layers):
if i<19:
model_encoder.add(layer)
model_encoder.summary()
for layer in model_encoder.layers:
layer.trainable=False
output:
Metal device set to: Apple M1 Pro
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
decoder network:
encoder_output = Input(shape=(7, 7, 512,))
x = Conv2D(512, (3, 3), activation='relu', padding='same')(encoder_output)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 1
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
x = UpSampling2D((2, 2))(x)
model_decoder = Model(inputs=encoder_output, outputs=x)
training:
model_decoder.compile(optimizer='Adam', loss='mean_squared_error' , metrics=['accuracy'])
model_decoder.fit(trainx_encoded, train_Y,
epochs=no_epocs,
batch_size=batch_size,
validation_split=validation_split)
Now while training, the loss and accuracy doesn't changes. I can think of reducing filters in the initial decoder layers but i fear that's going to affect the autoencoder. Here, i'm really clueless about what approach to follow.
training output:
Epoch 1/50
2022-04-27 22:10:05.336322: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
27/27 [==============================] - ETA: 0s - loss: 0.1678 - accuracy: 0.9689
2022-04-27 22:11:34.044172: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
27/27 [==============================] - 97s 4s/step - loss: 0.1678 - accuracy: 0.9689 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 2/50
27/27 [==============================] - 87s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 3/50
27/27 [==============================] - 86s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 4/50
27/27 [==============================] - 86s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 5/50
27/27 [==============================] - 91s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 6/50
27/27 [==============================] - 85s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 7/50
27/27 [==============================] - 87s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 8/50
27/27 [==============================] - 91s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 9/50
27/27 [==============================] - 85s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 10/50
27/27 [==============================] - ETA: 0s - loss: 0.1645 - accuracy: 1.0000
VGG16 is not trained to be used as an encoder for image reconstruction, it is trained to extract features from an image using which we can do classification task on the image.
This is why, you cannot use VGG16 as the encoder part of your denoise autoencoder.
However, if you want, you can use that architecture of VGG16 as the encoder part of your autoencoder, by just retraining those layers of VGG16,
Again, there is a problem with your current architecture of autoencoder -it downsamples too much, leading to a significant loss of data. A smaller architecture would work better. For example: