Fastest method to do an FFT

3k Views Asked by At

I have the following very basic example of doing a 2D FFT using various interfaces.

import time
import numpy
import pyfftw
import multiprocessing

a = numpy.random.rand(2364,2756).astype('complex128')

start = time.time()
b1 = numpy.fft.fft2(a)
end1 = time.time() - start

start = time.time()
b2 = pyfftw.interfaces.scipy_fftpack.fft2(a, threads=multiprocessing.cpu_count())
end2 = time.time() - start

pyfftw.forget_wisdom()
start = time.time()
b3 = pyfftw.interfaces.numpy_fft.fft2(a, threads=multiprocessing.cpu_count())
end3 = time.time() - start

pyfftw.forget_wisdom()
start = time.time()
b4 = numpy.zeros_like(a)
fft = pyfftw.FFTW(a, b4, axes=(0,1), flags=('FFTW_ESTIMATE',),planning_timelimit=1.0)
fft()
end4 = time.time() - start

print('numpy.fft.fft2:                        %.3f secs.' % end1)
print('pyfftw.interfaces.scipy_fftpack.fft2:  %.3f secs.' % end2)
print('pyfftw.interfaces.numpy_fft.fft2:      %.3f secs.' % end3)
print('pyfftw.FFTW:                           %.3f secs.' % end4)

This generates the following results:

numpy.fft.fft2:                        1.878 secs.
pyfftw.interfaces.scipy_fftpack.fft2:  50.133 secs.
pyfftw.interfaces.numpy_fft.fft2:      52.136 secs.
pyfftw.FFTW:                           0.331 secs.

Clearly, the pyfftw.FFTW interface is the fastest, but doesn't work (I am not sure what I am doing wrong).

The pyfftw.interfaces.scipy_fftpack.fft2 and pyfftw.interfaces.numpy_fft.fft2 take a considerable amount of time, but I have determined that time to largely be in the planning phase, which only happens the first time. In my case, only one FFT2 and one IFFT2 will ever be performed (per process), so the planning is killing me. If either is run a second time without forgetting the wisdom, they run in about 0.33 seconds also (but this won't happen in my case).

So, the question is: 1. What am I doing wrong in the pyfftw.FFTW that is causing the data to be wrong? - or - 2. How can I changing the planning scheme and time limit for pyfftw.interfaces.scipy_fftpack.fft2 or pyfftw.interfaces.numpy_fft.fft2?

2

There are 2 best solutions below

0
On BEST ANSWER

The solution I found was to use the builders interface:

fft = pyfftw.builders.fft2(a, overwrite_input=True, planner_effort='FFTW_ESTIMATE', threads=multiprocessing.cpu_count())
b = fft()
3
On

Modified the code to use the pyfftw.FFTW class correctly making it the most efficient and have reduced the execution time by a factor of two with the "builder" class.

import time
import numpy
import pyfftw
import multiprocessing
nthread = multiprocessing.cpu_count()
a = numpy.random.rand(2364,2756).astype('complex128')
""" 
Uncomment below to use 32 bit floats, 
increasing the speed by a factor of 4
and remove the difference between the "builders" and "FFTW" methods
"""
#a = numpy.random.rand(2364,2756).astype('complex64')

start = time.time()
b1 = numpy.fft.fft2(a)
end1 = time.time() - start

start = time.time()
b2 = pyfftw.interfaces.scipy_fftpack.fft2(a, threads=nthread)
end2 = time.time() - start

pyfftw.forget_wisdom()
start = time.time()
b3 = pyfftw.interfaces.numpy_fft.fft2(a, threads=nthread)
end3 = time.time() - start

""" By far the most efficient method """
pyfftw.forget_wisdom()
start = time.time()
b4 = numpy.zeros_like(a)
fft = pyfftw.FFTW( a, b4, axes=(0,1), direction='FFTW_FORWARD', flags=('FFTW_MEASURE', ), threads=nthread, planning_timelimit=None )
fft()
end4 = time.time() - start

""" 
For large arrays avoiding the copy is very important, 
doing this I get a speedup of 2x compared to not using it 
"""
pyfftw.forget_wisdom()
start = time.time()
b5 = numpy.zeros_like(a)
fft = pyfftw.builders.fft2(a, s=None, axes=(-2, -1), overwrite_input=False, planner_effort='FFTW_MEASURE', threads=nthread, auto_align_input=False, auto_contiguous=False, avoid_copy=True)
b5 = fft()
end5 = time.time() - start



print('numpy.fft.fft2:                        %.3f secs.' % end1)
print('pyfftw.interfaces.scipy_fftpack.fft2:  %.3f secs.' % end2)
print('pyfftw.interfaces.numpy_fft.fft2:      %.3f secs.' % end3)
print('pyfftw.FFTW:                           %.3f secs.' % end4)
print('pyfftw.builders:                       %.3f secs.' % end5)

Example output times on my 4 core i5 CPU, using 64 bit floats:

numpy.fft.fft2:                        1.537 secs.
pyfftw.interfaces.scipy_fftpack.fft2:  0.248 secs.
pyfftw.interfaces.numpy_fft.fft2:      0.248 secs.
pyfftw.FFTW:                           0.084 secs.
pyfftw.builders:                       0.143 secs.

Example output times on my 4 core i5 CPU, using 32 bit floats:

numpy.fft.fft2:                        1.414 secs.
pyfftw.interfaces.scipy_fftpack.fft2:  0.066 secs.
pyfftw.interfaces.numpy_fft.fft2:      0.066 secs.
pyfftw.FFTW:                           0.043 secs.
pyfftw.builders:                       0.043 secs.