PyFFTW slower than SciPy FFT?

1.5k Views Asked by At

I tried solution presented here on Stackoverflow by User: henry-gomersall to repeat speed up FFT based convolution, but obtained different result.

import numpy as np
import pyfftw
import scipy.signal
import timeit

class CustomFFTConvolution(object):

    def __init__(self, A, B, threads=1):

        shape = (np.array(A.shape) + np.array(B.shape))-1

        if np.iscomplexobj(A) and np.iscomplexobj(B):
            self.fft_A_obj = pyfftw.builders.fftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.fftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.ifftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

        else:
            self.fft_A_obj = pyfftw.builders.rfftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.rfftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.irfftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

    def __call__(self, A, B):

        fft_padded_A = self.fft_A_obj(A)
        fft_padded_B = self.fft_B_obj(B)

        return self.ifft_obj(fft_padded_A * fft_padded_B)

N = 200

A = np.random.rand(N, N, N)
B = np.random.rand(N, N, N)

start_time = timeit.default_timer()

C = scipy.signal.fftconvolve(A,B,"same")
print timeit.default_timer() - start_time

custom_fft_conv_nthreads = CustomFFTConvolution(A, B, threads=1)
C = custom_fft_conv_nthreads(A, B)
print timeit.default_timer() - start_time

PyFFTW is approx. 7x slower than SciPy FFT which differs from other users experiences. What is wrong in this code? Python 2.7.9, PyFFTW 0.9.2.

1

There are 1 best solutions below

6
On

You're not doing what you think you're doing, and what you think you're doing you shouldn't be doing either.

You're not doing what you think you're doing because your code above only defines start_time once (so your test for pyfftw includes not only the time consuming creation of the CustomFFTConvolution object, but also the scipy convolution!).

You shouldn't be doing what you think you're doing because you should use timeit to test this sort of thing.

So, with some file foo.py:

import numpy as np
import pyfftw
import scipy.signal

class CustomFFTConvolution(object):

    def __init__(self, A, B, threads=1):

        shape = (np.array(A.shape) + np.array(B.shape))-1

        if np.iscomplexobj(A) and np.iscomplexobj(B):
            self.fft_A_obj = pyfftw.builders.fftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.fftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.ifftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

        else:
            self.fft_A_obj = pyfftw.builders.rfftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.rfftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.irfftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

    def __call__(self, A, B):

        fft_padded_A = self.fft_A_obj(A)
        fft_padded_B = self.fft_B_obj(B)

        return self.ifft_obj(fft_padded_A * fft_padded_B)

N = 200

A = np.random.rand(N, N, N)
B = np.random.rand(N, N, N)

In ipython, you can get the following:

In [1]: %run foo.py

In [2]: timeit scipy.signal.fftconvolve(A,B,"same")
1 loops, best of 3: 8.38 s per loop

In [3]: custom_fft_conv_nthreads = CustomFFTConvolution(A, B, threads=1)

In [4]: timeit custom_fft_conv_nthreads(A, B)
1 loops, best of 3: 6.9 s per loop

and with multiple threads:

In [5]: custom_fft_conv_nthreads = CustomFFTConvolution(A, B, threads=4)

In [6]: timeit custom_fft_conv_nthreads(A, B)
1 loops, best of 3: 3.81 s per loop

If you correct your code to do what you think it's doing by inserting start_time = timeit.default_timer() before C = custom_fft_conv_nthreads(A, B), you get something closer to what is expected:

10.8795630932
8.31241607666