Is known that the convolution in time domain is equal to the element wise of the FFT's of input and filter.
The time scalling property says that:
So, my question is: Can I apply downsampling (strides > 1 in convolutional network) in frequency domain and obtain the same result as the convolution in time domain? An example in Python with dowsampling factor equal to two (strides = 2) is:
# Dimensions
dims_img = [256, 256] # Image dimensions
dims_kernel = [3, 3] # Kernel dimension
fft_dims = dims_img + dims_kernel - [1,1] # Number of points in FFT
img = np.random.random(dims_img) # Random values between o and 1 for image
kernel = np.random.normal(0, 0.5, dims_kernel) # Random normal values for kernel
# Image and kernel FFT's
fft_img = np.fft.fft2(img, fft_dims) # Image fft
fft_kernel = np.fft.fft2(kernel, fft_dims) # Kernel fft
# Element wise to perform 2d convolution in frequency domain
fft_conv2d = fft_img * fft_kernel
# Now, to realize the downsampling, I convert newly
# my signal (fft_conv2d) to time domain and I apply
# the downsampling (strides)
time_conv2d = np.fft.ifft2(fft_conv2d) # Convert siganl to time domain
# Apply downsampling to signal
down_conv2d = time_conv2d[0::2] # I get only one of every two samples
I understand the time sacaling property, but I think that i need to apply this property before the element wise of FFT's, but if I do it in this way the convolution is not correct... Is correct this?
Some questions related that I've seen are:
Using FFT-Convolution when stride>1
https://dsp.stackexchange.com/questions/66464/handling-stride1-in-fft-based-convolution
https://inst.eecs.berkeley.edu/~ee123/sp18/Sections/sec6.pdf