I'm doing a neural style transfer. I'm trying to reconstruct the output of the convolutional layer conv4_2 of the VGG19 network.
def get_features(image, model):
layers = {'0': 'conv1_1', '5': 'conv2_1', '10': 'conv3_1',
'19': 'conv4_1', '21': 'conv4_2', '28': 'conv5_1'}
x = image
features = {}
for name, layer in model._modules.items():
x = layer(x)
if name in layers:
features[layers[name]] = x
return features
content_img_features = get_features(content_img, vgg)
style_img_features = get_features(style_img, vgg)
target_content = content_img_features['conv4_2']
content_img_features is a dict that contains the output of every layer.
target_content is a tensor of shape torch.Size([1, 512, 50, 50])
This is the method I use to plot the image using the tensor. It works fine for the input image as well as the final output.
def tensor_to_image(tensor):
image = tensor.clone().detach()
image = image.numpy().squeeze()
image = image.transpose(1, 2, 0)
image *= np.array((0.22, 0.22, 0.22))+ np.array((0.44, 0.44, 0.44))
image = image.clip(0, 1)
return image
image = tensor_to_image(target_content)
fig = plt.figure()
plt.imshow(image)
But this throws the error,
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-188-a75a5f0743bb> in <module>()
1
----> 2 image = tensor_to_image(target_content)
3 fig = plt.figure()
4 plt.imshow(image)
<ipython-input-186-e9385dbc4a85> in tensor_to_image(tensor)
3 image = image.numpy().squeeze()
4 image = image.transpose(1, 2, 0)
----> 5 image *= np.array((0.22, 0.22, 0.22))+ np.array((0.44, 0.44, 0.44))
6 image = image.clip(0, 1)
7 return image
ValueError: operands could not be broadcast together with shapes (50,50,512) (3,) (50,50,512)
This is the initial transformation I apply to the image before passing to the cnn layers,
def transformation(img):
tasks = tf.Compose([tf.Resize(400), tf.ToTensor(),
tf.Normalize((0.44,0.44,0.44),(0.22,0.22,0.22))])
img = tasks(img)[:3,:,:].unsqueeze(0)
return img
How do I fix this? Is there another way to reconstruct the image from the convolution layer?
Your
tensor_to_image
method only works for 3 channel images. Your input to the network is 3 channels, so is the final output, therefore it works fine there. But you cannot do the same at an internal high dimensional activation.Essentially the problem is that you try to apply a channel-wise normalization, but you have parameters for only three channels, that's why that particular line fails. You would need a 512 element vector of means and standard deviations. So for example this would work:
However the fundamental problem is still that you try to visualize a high dimensional, 512 channel image, instead a traditional 3 channel (RGB) image. You may try to visualize channels separately, or in groups of 3, but still it might not be really useful.