I am trying to train a model(by incorporating vgg16 in the encoder network of autoencoder) but the input shape requirement is(7,7,512) for the decoder network. While my data is in grayscale vgg16 requires 3 color channels so for this i've copied the data array thrice to do so which is not the problem. The problem is here where i'm trying to reshape the array which is not happening and giving me errors. code: train_X and train_Y are the list containing training datasets of size 5k each with dims=224,224 and are in gray scale. After this I've done->
train_X=np.array(X_train)
train_Y=np.array(Y_train)
train_X=train_X/255.0
train_Y=train_Y/255.0
print(train_Y.shape)
train_Y = np.repeat(train_Y[..., np.newaxis], 3, -1)
print(train_Y.shape)
#same for train_X
print(train_Y.shape)
print(train_X.shape)
output->(5000, 224, 224, 3) & (5000, 224, 224, 3)
trainx = train_X.reshape((7,7,512))
error: ValueError: cannot reshape array of size 50176 into shape (7,7,512)
network I'm trying to train:
#encoder
encoder_input = Input(shape=(7,7,512,))
#Decoder
decoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_input)
decoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(16, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=encoder_input, outputs=decoder_output)
Encoder is vgg16. model summary:
Metal device set to: Apple M1 Pro
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
have tries several hacks but i'm not able to get through this problem.
You are trying to use the
VGG_16
pretrained model
as anencoder network
. If I understand correctly you are using the encoder and decoder to denoise images. The encoder is the pretrainedVGG_16
model and the decoder is a model (give them different names) you are training usingtrain_y
as your ground truth for denoised images.Feed the images into the encoder network.
trainx_encoded = model.predict(train_x)
Check the
model_decoder
for it'soutput-shape
and make sure it matches thetrain_y
shape.Running this myself informed me that the output layer has a shape of (224,224,2). You have two options:
decoder_output = Conv2D(3, (3, 3), activation='tanh', padding='same')(decoder_output)
decoder_output = Conv2D(1, (3, 3), activation='tanh', padding='same')(decoder_output)
Then use the encoded data as training data for the second model.
model_decoder.fit(trainx_encoded , train_y)