I have written some code to apply filters to an image using kernel convolution. Currently, it takes quite a long time, approximately 30 seconds for a 400x400 image. I understand that box blurs are much faster than Gaussian blurs. However, when I change my kernel to a box blur it seems to take as much time as the Gaussian blur. Any ideas?
import cv2
import numpy as np
img = cv2.imread('test.jpg')
img2 = cv2.imread('test.jpg')
height, width, channels = img.shape
GB3 = np.array([[1,2,1], [2,4,2], [1,2,1]])
GB5 = np.array([[1,4,6,4,1], [4,16,24,16,4], [6,24,36,24,6], [4,16,24,16,4], [1,4,6,4,1]])
BB = np.array([[1,1,1], [1,1,1], [1,1,1]])
kernel = BB
#initialise
kernel_sum = 1
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
p = kernel[i][j]
kernel_sum += p
for x in range(1,width-1):
for y in range(1,height-1):
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
filtered_sum_b += img[y-1+j,x-1+i,0]*kernel[i][j]
filtered_sum_g += img[y-1+j,x-1+i,1]*kernel[i][j]
filtered_sum_r += img[y-1+j,x-1+i,2]*kernel[i][j]
new_pixel_r = filtered_sum_r/kernel_sum
new_pixel_g = filtered_sum_g/kernel_sum
new_pixel_b = filtered_sum_b/kernel_sum
if new_pixel_r>255:
new_pixel_r = 255
elif new_pixel_r<0:
new_pixel_r = 0
if new_pixel_g>255:
new_pixel_g = 255
elif new_pixel_g<0:
new_pixel_g = 0
if new_pixel_b>255:
new_pixel_b = 255
elif new_pixel_b<0:
new_pixel_b = 0
img2[y,x,0] = new_pixel_b
img2[y,x,1] = new_pixel_g
img2[y,x,2] = new_pixel_r
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
#print(kernel_sum)
scale = 2
img_big = cv2.resize(img, (0,0), fx=scale, fy=scale)
img2_big = cv2.resize(img2, (0,0), fx=scale, fy=scale)
cv2.imshow('original', img_big)
cv2.imshow('processed', img2_big)
cv2.waitKey(0)
cv2.destroyAllWindows()
Use OpenCV's filter2D function for your convolutions.
As for box blur vs gaussian, the only difference is "interesting" weights vs. no weights (all equal). That amounts to a few more multiplications, or not. When the code is optimized, its execution time can be dominated by the time needed to transfer the data from RAM to CPU. that goes for optimized code, not pure python loops.