I want to reduce the time to compute 2D FFT in C++ on 100 million complex data

361 Views Asked by At

I am trying to compute 2D FFT on 100 million complex data (100000x1000) and it is taking 4.6 seconds approximately, but I want to reduce time. Then I tried to compute it using fftw_thread. But then the computation time has increased (in 2 threads time taken - 8.5 sec an in 4 threads time taken - 16.5 sec). I am using FFTW3 library for C++ and OS - ubuntu 18.04 I am attaching the C++ code below :

#include <iostream>
#include <time.h>
#include <fftw3.h>
using namespace std;
#define ROW 100000
#define COL 1000

int main() {
        fftwf_complex *in = (fftwf_complex *)calloc(ROW*COL,sizeof(fftwf_complex));
        fftwf_complex *out = (fftwf_complex *)calloc(ROW*COL,sizeof(fftwf_complex));

        // generating random data
        for(uint32_t i = 0 ; i < ROW*COL ; i++) {
            in[i][0] = i+1;
            in[i][1] = i+2;
        }
        int thread_number = 2;
        fftwf_plan_with_nthreads(thread_number);
        int h = fftwf_init_threads();
        fftwf_plan p = fftwf_plan_dft_2d(ROW,COL,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
        fftwf_execute(p);
        fftwf_destroy_plan(p);
        fftwf_cleanup_threads();
}

I am getting no error. I want to reduce the execution time. Can anyone please help me in this matter to reduce the time to compute the 2D FFT on 100 million data.

1

There are 1 best solutions below

4
On

How did you measure execution time? Note that the actual FFT is done with fftwf_execute. The rest is initialization and cleaning up. See the code below (if you are not on Linux modify time_in_secs to fit to your system). On my computer the code below takes around 10 seconds with one thread, 6 seconds with two threads and around 3.6 seconds with four threads. That's for the FFT part (t3-t2).

#include <iostream>
#include <time.h>
#include <fftw3.h>
#define ROW 100000
#define COL 1000

double
time_in_secs()
{
  struct timespec   t;

  clock_gettime( CLOCK_MONOTONIC /* CLOCK_PROCESS_CPUTIME_ID */, &t );

  return (double)t.tv_sec + 1.0E-09 * (double)t.tv_nsec;
}


int main() {
        fftwf_complex *in = (fftwf_complex *)calloc(ROW*COL,sizeof(fftwf_complex));
        fftwf_complex *out = (fftwf_complex *)calloc(ROW*COL,sizeof(fftwf_complex));

        // generating random data
        for(uint32_t i = 0 ; i < ROW*COL ; i++) {
            in[i][0] = i+1;
            in[i][1] = i+2;
        }
        int thread_number = 6;
        
        double  t1 = time_in_secs();
        
        fftwf_plan_with_nthreads(thread_number);
        int h = fftwf_init_threads();
        fftwf_plan p = fftwf_plan_dft_2d(ROW,COL,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
        
        double  t2 = time_in_secs();
        
        fftwf_execute(p);
        
        double  t3 = time_in_secs();
        
        fftwf_destroy_plan(p);
        fftwf_cleanup_threads();
        
        std::cout << "Time for init: " << t2-t1 << " sec\n";
        std::cout << "Time for FFT:  " << t3-t2 << " sec\n";
        std::cout << "Total time:    " << t3-t1 << " sec\n";
        std::cout << "# threads:     " << thread_number << '\n';
}

Speeding up the initialization process can be done utilizing wisdom as shown below. In the first run of the program the wisdom file will not be found. Computation of the plan takes its time. In successive calls the wisdom will be used for accelerated computation of the plan. Notice that fftwf_init_threads must be called before the wisdom file gets read.

        double  t1 = time_in_secs();
        
        fftwf_plan_with_nthreads(thread_number);
        int h = fftwf_init_threads();
        
        const char * wisdom_file = "fftw_wisdom.dat";
        FILE   *w_file= fopen( wisdom_file, "r" );
        if( w_file )
        {
          int ec = fftwf_import_wisdom_from_file( w_file );
          fclose( w_file );
          std::cout << "Read wisdom file " << ec << '\n';
        }
        else
        {
          std::cout << "No wisdom file found\n";
        }
        
        fftwf_plan p = fftwf_plan_dft_2d(ROW,COL,in,out,FFTW_FORWARD,FFTW_MEASURE);
         
        w_file= fopen( wisdom_file, "w" );
        if( w_file )
        {
          fftwf_export_wisdom_to_file( w_file );
          fclose( w_file );
          std::cout << "Wrote wisdom file\n";
        }
        
        double  t2 = time_in_secs();

Compared to the initial example we have set the planner flag to FFTW_MEASURE. This makes the effect of wisdom storage more pronounced.