Fastest method to do an FFT

Question

Fastest method to do an FFT

3.1k Views Asked by steveo225 At 02 July 2025 at 19:29

I have the following very basic example of doing a 2D FFT using various interfaces.

import time
import numpy
import pyfftw
import multiprocessing

a = numpy.random.rand(2364,2756).astype('complex128')

start = time.time()
b1 = numpy.fft.fft2(a)
end1 = time.time() - start

start = time.time()
b2 = pyfftw.interfaces.scipy_fftpack.fft2(a, threads=multiprocessing.cpu_count())
end2 = time.time() - start

pyfftw.forget_wisdom()
start = time.time()
b3 = pyfftw.interfaces.numpy_fft.fft2(a, threads=multiprocessing.cpu_count())
end3 = time.time() - start

pyfftw.forget_wisdom()
start = time.time()
b4 = numpy.zeros_like(a)
fft = pyfftw.FFTW(a, b4, axes=(0,1), flags=('FFTW_ESTIMATE',),planning_timelimit=1.0)
fft()
end4 = time.time() - start

print('numpy.fft.fft2:                        %.3f secs.' % end1)
print('pyfftw.interfaces.scipy_fftpack.fft2:  %.3f secs.' % end2)
print('pyfftw.interfaces.numpy_fft.fft2:      %.3f secs.' % end3)
print('pyfftw.FFTW:                           %.3f secs.' % end4)

This generates the following results:

numpy.fft.fft2:                        1.878 secs.
pyfftw.interfaces.scipy_fftpack.fft2:  50.133 secs.
pyfftw.interfaces.numpy_fft.fft2:      52.136 secs.
pyfftw.FFTW:                           0.331 secs.

Clearly, the pyfftw.FFTW interface is the fastest, but doesn't work (I am not sure what I am doing wrong).

The pyfftw.interfaces.scipy_fftpack.fft2 and pyfftw.interfaces.numpy_fft.fft2 take a considerable amount of time, but I have determined that time to largely be in the planning phase, which only happens the first time. In my case, only one FFT2 and one IFFT2 will ever be performed (per process), so the planning is killing me. If either is run a second time without forgetting the wisdom, they run in about 0.33 seconds also (but this won't happen in my case).

So, the question is: 1. What am I doing wrong in the pyfftw.FFTW that is causing the data to be wrong? - or - 2. How can I changing the planning scheme and time limit for pyfftw.interfaces.scipy_fftpack.fft2 or pyfftw.interfaces.numpy_fft.fft2?

Original Q&A

There are 2 best solutions below

Josh On 29 April 2016 at 14:21

Modified the code to use the pyfftw.FFTW class correctly making it the most efficient and have reduced the execution time by a factor of two with the "builder" class.

import time
import numpy
import pyfftw
import multiprocessing
nthread = multiprocessing.cpu_count()
a = numpy.random.rand(2364,2756).astype('complex128')
""" 
Uncomment below to use 32 bit floats, 
increasing the speed by a factor of 4
and remove the difference between the "builders" and "FFTW" methods
"""
#a = numpy.random.rand(2364,2756).astype('complex64')

start = time.time()
b1 = numpy.fft.fft2(a)
end1 = time.time() - start

start = time.time()
b2 = pyfftw.interfaces.scipy_fftpack.fft2(a, threads=nthread)
end2 = time.time() - start

pyfftw.forget_wisdom()
start = time.time()
b3 = pyfftw.interfaces.numpy_fft.fft2(a, threads=nthread)
end3 = time.time() - start

""" By far the most efficient method """
pyfftw.forget_wisdom()
start = time.time()
b4 = numpy.zeros_like(a)
fft = pyfftw.FFTW( a, b4, axes=(0,1), direction='FFTW_FORWARD', flags=('FFTW_MEASURE', ), threads=nthread, planning_timelimit=None )
fft()
end4 = time.time() - start

""" 
For large arrays avoiding the copy is very important, 
doing this I get a speedup of 2x compared to not using it 
"""
pyfftw.forget_wisdom()
start = time.time()
b5 = numpy.zeros_like(a)
fft = pyfftw.builders.fft2(a, s=None, axes=(-2, -1), overwrite_input=False, planner_effort='FFTW_MEASURE', threads=nthread, auto_align_input=False, auto_contiguous=False, avoid_copy=True)
b5 = fft()
end5 = time.time() - start



print('numpy.fft.fft2:                        %.3f secs.' % end1)
print('pyfftw.interfaces.scipy_fftpack.fft2:  %.3f secs.' % end2)
print('pyfftw.interfaces.numpy_fft.fft2:      %.3f secs.' % end3)
print('pyfftw.FFTW:                           %.3f secs.' % end4)
print('pyfftw.builders:                       %.3f secs.' % end5)

Example output times on my 4 core i5 CPU, using 64 bit floats:

numpy.fft.fft2:                        1.537 secs.
pyfftw.interfaces.scipy_fftpack.fft2:  0.248 secs.
pyfftw.interfaces.numpy_fft.fft2:      0.248 secs.
pyfftw.FFTW:                           0.084 secs.
pyfftw.builders:                       0.143 secs.

Example output times on my 4 core i5 CPU, using 32 bit floats:

numpy.fft.fft2:                        1.414 secs.
pyfftw.interfaces.scipy_fftpack.fft2:  0.066 secs.
pyfftw.interfaces.numpy_fft.fft2:      0.066 secs.
pyfftw.FFTW:                           0.043 secs.
pyfftw.builders:                       0.043 secs.

**steveo225** · Accepted Answer

steveo225 On 29 August 2014 at 11:40 BEST ANSWER

The solution I found was to use the builders interface:

fft = pyfftw.builders.fft2(a, overwrite_input=True, planner_effort='FFTW_ESTIMATE', threads=multiprocessing.cpu_count())
b = fft()

Fastest method to do an FFT

There are 2 best solutions below

Related Questions in PYTHON-3.X

Related Questions in NUMPY

Related Questions in FFT

Related Questions in PYFFTW

Trending Questions

Popular # Hahtags

Popular Questions