I am trying to implement 2-D convolution in python. I have an input image set of dimensions (m, 64, 64, 3), where m is the number of images. I want to use a filter size f=8 and stride=8 for both height and width, and SAME padding so that input width and height (64, 64) are preserved.
Using the formula [n' = floor((n-f+2*pad)/stride + 1)] and putting n'=64, n=64, stride=8, f=8, I get pad=224, which is unreasonably large.
For example, when I took m, the number of images, as 1080, it presumably resulted in a memory error and my system crashed.
But when I used the Keras library and the following code, it worked fine.
X = keras.layers.Conv2D(filters=32, kernel_size=(8, 8), strides=(8, 8), padding='same')(X)
Here is my implementation of the Conv2D in python:
import numpy as np
# A.shape = (1080, 64, 64, 3)
# W.shape = (8, 8, 3, 32)
# b.shape = (32,)
def conv_fwd(A, W, b, pad=0, stride=1):
pad_A = np.pad(A, ((0, 0), (pad, pad), (pad, pad), (0, 0)), mode='constant')
(m, w, h, nc) = A.shape
(fw, fh, ncc, ncn) = W.shape
if nc != ncc:
raise Exception('Number of channels in kernel and input do not match')
wn = int((w-fw+2*pad)/stride + 1)
hn = int((h-fh+2*pad)/stride + 1)
A_n = np.zeros((m, wn, hn, ncn))
W = W.reshape(fw*fh*ncc, ncn)
for i in range(wn):
for j in range(hn):
A_n[:, i, j] = pad_A[:, i*stride:i*stride+fw, j*stride:j*stride+fh].reshape(m, fw*fh*nc).dot(W) + b
return A_n
So I'm assuming there is a different process for calculating the padding in keras. I tried looking for the source code, but couldn't find it. How does it work?
In the formula,
n' = floor((n-f+2*pad)/stride + 1
you have takenn' == n == 64
.That is not correct.
n' is equal to n
only when value ofStride is equal to 1
but here,Stride
is greater than 1 (8
).That's the reason you are getting very high value for
Padding
.Now, as your goal is to find the value of
Padding
, I have a solution/workaround (which might not be very optimized).Initially, build the Model with
Padding = Same
, as shown below:Summary of the Model with
Padding = Same
is shown below:If we observe the
Shape
of theImage
, it is reduced from(64,64)
to(32,32)
even thoughPadding == Same
.Now, build the Model with
Padding = Valid
, as shown below:Summary for the above Model is shown below:
If we observe, the
Shape
of theConvolutional Layer
is(None,31,31,64)
.Now,
Padding
can be obtained by the formula,or
i.e.,
32 - 31 = 1
.Padding in your case, with
Input Shape
=(64, 64,3)
,Filter Size = 8
,Strides = 8
is 1 i.e.,Input
is Padded with 1 Row and 1 Column of Zeros.