Using gprof and speeding up element-wise array multiplication in C++

88 Views Asked by At

I am using gprof as a profiler for my code for the purpose of enhancing efficiency/speed of my code and possibly parallelizing it eventually. I was able to profile the code and found out big portion of the total runtime of the code is being used by "multiplication functions" or functions that perform element wise multiplication for 2d arrays by each other or by a scalar and so on. I am aware of how multiplication and division can be computationally expensive, and so would love some suggestion for my specific examples. But honestly I am not sure what options do I have to speed up an element-wise multiplication?

The output of the profiler:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.01      0.02     0.02       90     0.22     0.22  scalArr2DMult(double*, double, double*)
 25.01      0.04     0.02       40     0.50     0.50  Arr3DArr2DMult(double (*) [2], double*, double (*) [2])
 25.01      0.06     0.02       34     0.59     0.59  Arr2DArr2DMult(double*, double*, double*)

where:

%         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
       else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this
       function is profiled, else blank.

Now, for example scalArr2DMult is a function that multiplies a 2D physical array by a scalar. (Division handled by just modifying scal outside of the function.):

void scalArr2DMult(double arr[], double scal, double arrOut[]){
    for(int i = 0; i < nx; i++){
        for( int j = 0; j < ny; j++){
            arrOut[j + ny*i] = arr[j + ny*i] * scal;            
        }
    }
}

Another example is the function Arr3DArr2DMult where it multiplies a 3D Fourier array by a 2D real valued array, hence, k indexing:

void Arr3DArr2DMult(double arr3D[][ncomp], double arr2D[], double arrOut[][ncomp]){
    for(int i = 0; i < nx; i++){
        for(int j = 0; j < nyk; j++){
            for (int k = 0; k < ncomp; k++){
                arrOut[j + nyk*i][k] = arr3D[j + nyk*i][k] * arr2D[j + nyk*i];
            }
        }
    }   
}

Where, nx, ny, nyk, ncomp are defined:

static const int nx = 256; 
static const int ny = 256; 
static const int nyk = ny/2 + 1;
static const int ncomp = 2;

I understand that there are more sophisticated ways of dealing with element-wise array multiplications such as using something like Eigen (I am mostly struggling with piecing the fttw3h library with Eigen and not sure why I am running into issues with that) or writing all matrices as instances of a Matrix<T> class (which I am currently also trying to test) but for the sake of simplicity is there an easier fix for a beginner like me to try first? Thanks.

0

There are 0 best solutions below