ValueError: cannot reshape array of size 50176 into shape (7,7,512)

1.5k Views Asked by At

I am trying to train a model(by incorporating vgg16 in the encoder network of autoencoder) but the input shape requirement is(7,7,512) for the decoder network. While my data is in grayscale vgg16 requires 3 color channels so for this i've copied the data array thrice to do so which is not the problem. The problem is here where i'm trying to reshape the array which is not happening and giving me errors. code: train_X and train_Y are the list containing training datasets of size 5k each with dims=224,224 and are in gray scale. After this I've done->

train_X=np.array(X_train)
train_Y=np.array(Y_train)

train_X=train_X/255.0
train_Y=train_Y/255.0

print(train_Y.shape)  
train_Y = np.repeat(train_Y[..., np.newaxis], 3, -1)
print(train_Y.shape) 

#same for train_X
 
print(train_Y.shape)
print(train_X.shape)

output->(5000, 224, 224, 3) & (5000, 224, 224, 3)

trainx = train_X.reshape((7,7,512))

error: ValueError: cannot reshape array of size 50176 into shape (7,7,512)

network I'm trying to train:

#encoder
encoder_input = Input(shape=(7,7,512,))
#Decoder
decoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_input)
decoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(16, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=encoder_input, outputs=decoder_output)

Encoder is vgg16. model summary:

Metal device set to: Apple M1 Pro
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

have tries several hacks but i'm not able to get through this problem.

1

There are 1 best solutions below

7
On

You are trying to use the VGG_16 pretrained model as an encoder network. If I understand correctly you are using the encoder and decoder to denoise images. The encoder is the pretrained VGG_16 model and the decoder is a model (give them different names) you are training using train_y as your ground truth for denoised images.

Feed the images into the encoder network.

trainx_encoded = model.predict(train_x)

Check the model_decoder for it's output-shape and make sure it matches the train_y shape.

for layer in model_decoder.layers:
    print(layer.output_shape)

Running this myself informed me that the output layer has a shape of (224,224,2). You have two options:

  1. Change the decoder network to have an output shape of (224,224,3) by updating the last conv layer to have 3 channels.

decoder_output = Conv2D(3, (3, 3), activation='tanh', padding='same')(decoder_output)

  1. Leave the train_y data as grayscale with one channel and update the above layer to have on e channel.

decoder_output = Conv2D(1, (3, 3), activation='tanh', padding='same')(decoder_output)

Then use the encoded data as training data for the second model.

model_decoder.fit(trainx_encoded , train_y)