Box blur is not any faster than Gaussian blur?

2.3k Views Asked by At

I have written some code to apply filters to an image using kernel convolution. Currently, it takes quite a long time, approximately 30 seconds for a 400x400 image. I understand that box blurs are much faster than Gaussian blurs. However, when I change my kernel to a box blur it seems to take as much time as the Gaussian blur. Any ideas?

import cv2
import numpy as np

img = cv2.imread('test.jpg')
img2 = cv2.imread('test.jpg')

height, width, channels = img.shape

GB3 = np.array([[1,2,1], [2,4,2], [1,2,1]])
GB5 = np.array([[1,4,6,4,1], [4,16,24,16,4], [6,24,36,24,6], [4,16,24,16,4], [1,4,6,4,1]])
BB = np.array([[1,1,1], [1,1,1], [1,1,1]])

kernel = BB

#initialise
kernel_sum = 1

filtered_sum_r = 0 
filtered_sum_g = 0 
filtered_sum_b = 0 


for i in range(kernel.shape[0]):
    for j in range(kernel.shape[1]):
        p = kernel[i][j]
        kernel_sum += p 

for x in range(1,width-1):
    for y in range(1,height-1):
        for i in range(kernel.shape[0]):
            for j in range(kernel.shape[1]):
                filtered_sum_b += img[y-1+j,x-1+i,0]*kernel[i][j]
                filtered_sum_g += img[y-1+j,x-1+i,1]*kernel[i][j]
                filtered_sum_r += img[y-1+j,x-1+i,2]*kernel[i][j]
        
        new_pixel_r = filtered_sum_r/kernel_sum
        new_pixel_g = filtered_sum_g/kernel_sum
        new_pixel_b = filtered_sum_b/kernel_sum

        if new_pixel_r>255:
            new_pixel_r = 255
        elif new_pixel_r<0: 
            new_pixel_r = 0

        if new_pixel_g>255:
            new_pixel_g = 255
        elif new_pixel_g<0: 
            new_pixel_g = 0

        if new_pixel_b>255:
            new_pixel_b = 255
        elif new_pixel_b<0: 
            new_pixel_b = 0

        img2[y,x,0] = new_pixel_b
        img2[y,x,1] = new_pixel_g
        img2[y,x,2] = new_pixel_r

        filtered_sum_r = 0 
        filtered_sum_g = 0 
        filtered_sum_b = 0 
        #print(kernel_sum)

scale = 2
img_big = cv2.resize(img, (0,0), fx=scale, fy=scale) 
img2_big = cv2.resize(img2, (0,0), fx=scale, fy=scale) 


cv2.imshow('original', img_big)
cv2.imshow('processed', img2_big)

cv2.waitKey(0)
cv2.destroyAllWindows()
1

There are 1 best solutions below

0
On
  • you are using python loops. that will always be orders of magnitude slower than optimized binary code. whenever possible, use library functions, i.e. numpy and OpenCV. or write your critical code as compilable Cython.
  • your code's access pattern is suboptimal. you should move along rows in the inner loop (for y: for x:) because that's how the image is stored. the reason here is how your CPU's cache is used. in row-major storage, a cache line contains several pixels in a row. if you run along columns, you only use that cache line once before needing another.
  • your code doesn't make use of the property that both types of filter are "separable"
  • convolution can be expressed as an elementwise multiplication in the frequency domain (DFT, multiply, inverse DFT), which is the usual way to perform convolutions.

Use OpenCV's filter2D function for your convolutions.

As for box blur vs gaussian, the only difference is "interesting" weights vs. no weights (all equal). That amounts to a few more multiplications, or not. When the code is optimized, its execution time can be dominated by the time needed to transfer the data from RAM to CPU. that goes for optimized code, not pure python loops.