How does SAME padding work in convolution neural networks, when stride is greater than 1?

2.8k Views Asked by At

I am trying to implement 2-D convolution in python. I have an input image set of dimensions (m, 64, 64, 3), where m is the number of images. I want to use a filter size f=8 and stride=8 for both height and width, and SAME padding so that input width and height (64, 64) are preserved.

Using the formula [n' = floor((n-f+2*pad)/stride + 1)] and putting n'=64, n=64, stride=8, f=8, I get pad=224, which is unreasonably large.

For example, when I took m, the number of images, as 1080, it presumably resulted in a memory error and my system crashed.

But when I used the Keras library and the following code, it worked fine.

X = keras.layers.Conv2D(filters=32, kernel_size=(8, 8), strides=(8, 8), padding='same')(X)

Here is my implementation of the Conv2D in python:

import numpy as np

# A.shape = (1080, 64, 64, 3)
# W.shape = (8, 8, 3, 32)
# b.shape = (32,)

def conv_fwd(A, W, b, pad=0, stride=1):
    pad_A = np.pad(A, ((0, 0), (pad, pad), (pad, pad), (0, 0)), mode='constant')
    (m, w, h, nc) = A.shape
    (fw, fh, ncc, ncn) = W.shape

    if nc != ncc:
        raise Exception('Number of channels in kernel and input do not match')

    wn = int((w-fw+2*pad)/stride + 1)
    hn = int((h-fh+2*pad)/stride + 1)
    A_n = np.zeros((m, wn, hn, ncn))
    W = W.reshape(fw*fh*ncc, ncn)

    for i in range(wn):
        for j in range(hn):
            A_n[:, i, j] = pad_A[:, i*stride:i*stride+fw, j*stride:j*stride+fh].reshape(m, fw*fh*nc).dot(W) + b
    return A_n

So I'm assuming there is a different process for calculating the padding in keras. I tried looking for the source code, but couldn't find it. How does it work?

1

There are 1 best solutions below

0
On

In the formula, n' = floor((n-f+2*pad)/stride + 1 you have taken n' == n == 64.

That is not correct. n' is equal to n only when value of Stride is equal to 1 but here, Stride is greater than 1 (8).

That's the reason you are getting very high value for Padding.

Now, as your goal is to find the value of Padding, I have a solution/workaround (which might not be very optimized).

Initially, build the Model with Padding = Same, as shown below:

import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters = 64, strides = (2,2), kernel_size = (3,3), 
input_shape = (64,64,3), padding = 'same'))
print(model.summary())

Summary of the Model with Padding = Same is shown below:

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_25 (Conv2D)           (None, 32, 32, 64)        1792      
=================================================================
Total params: 1,792
Trainable params: 1,792
Non-trainable params: 0

If we observe the Shape of the Image, it is reduced from (64,64) to (32,32) even though Padding == Same.

Now, build the Model with Padding = Valid, as shown below:

import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters = 64, strides = (2,2), kernel_size = (3,3), 
input_shape = (64,64,3), padding = 'valid'))
print(model.summary())

Summary for the above Model is shown below:

Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_24 (Conv2D)           (None, 31, 31, 64)        1792      
=================================================================
Total params: 1,792
Trainable params: 1,792
Non-trainable params: 0

If we observe, the Shape of the Convolutional Layer is (None,31,31,64).

Now, Padding can be obtained by the formula,

Height with SAME Padding - Height with VALID Padding

or

Width with SAME Padding - Width with VALID Padding

i.e., 32 - 31 = 1.

Padding in your case, with Input Shape = (64, 64,3), Filter Size = 8, Strides = 8 is 1 i.e.,

Input is Padded with 1 Row and 1 Column of Zeros.