I have a large collection of large images (ex. 15000x15000 pixels) that I would like to blur. I need to blur the images using a distance function, so the further away I move from some areas in the image the more heavier the blurring should be. I have a distance map describing how far a given pixel is from the areas.
Due to the large amount of images I have to consider performance. I have looked at NumPY/SciPY, they have some great functions but they seem to use a fixed kernel size and I need to reduce or increase the kernel size depending on the distance to the previous mentioned areas.
How can I solve this problem in python?
UPDATE: My solution so far based on the answer by rth:
# cython: boundscheck=False
# cython: cdivision=True
# cython: wraparound=False
import numpy as np
cimport numpy as np
def variable_average(int [:, ::1] data, int[:,::1] kernel_size):
cdef int width, height, i, j, ii, jj
width = data.shape[1]
height = data.shape[0]
cdef double [:, ::1] data_blurred = np.empty([width, height])
cdef double res
cdef int sigma, weight
for i in range(width):
for j in range(height):
weight = 0
res = 0
sigma = kernel_size[i, j]
for ii in range(i - sigma, i + sigma + 1):
for jj in range(j - sigma, j + sigma + 1):
if ii < 0 or ii >= width or jj < 0 or jj >= height:
continue
res += data[ii, jj]
weight += 1
data_blurred[i, j] = res/weight
return data_blurred
Test:
data = np.random.randint(256, size=(1024,1024))
kernel = np.random.randint(256, size=(1024,1024)) + 1
result = np.asarray(variable_average(data, kernel))
The method using the above settings takes around 186seconds to run. Is that what I can expect to ultimately squeeze out of the method or are there optimizations that I can use to further increase the performance (still using Python)?
As you have noted related
scipy
functions do not support variable size blurring. You could implement this in pure python with for loops, then use Cython, Numba or PyPy to get a C-like performance.Here is a low level python implementation, than uses numpy only for data storage,
that calculates an arithmetic average of pixels with a variable kernel size. It is a bad implementation with respect to numpy, in a sense that is it not vectorized. However, this makes it convenient to port to other high performance solutions:
Cython: simply statically typing variables, and compiling should give you C-like performance,
see this post for a complete example, as well as the compilation notes.
Numba: Wrapping the above function with the
@jit
decorator, should be mostly sufficient.PyPy: installing PyPy + the experimental numpy branch, could be another alternative worth trying. Although, then you would have to use PyPy for all your code, which might not be possible at present.
Once you have a fast implementation, you can then use
multiprocessing
, etc. to process different images in parallel, if need be. Or even parallelize with OpenMP in Cython the outerfor
loop.