I am using gprof as a profiler for my code for the purpose of enhancing efficiency/speed of my code and possibly parallelizing it eventually. I was able to profile the code and found out big portion of the total runtime of the code is being used by "multiplication functions" or functions that perform element wise multiplication for 2d arrays by each other or by a scalar and so on. I am aware of how multiplication and division can be computationally expensive, and so would love some suggestion for my specific examples. But honestly I am not sure what options do I have to speed up an element-wise multiplication?
The output of the profiler:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
25.01 0.02 0.02 90 0.22 0.22 scalArr2DMult(double*, double, double*)
25.01 0.04 0.02 40 0.50 0.50 Arr3DArr2DMult(double (*) [2], double*, double (*) [2])
25.01 0.06 0.02 34 0.59 0.59 Arr2DArr2DMult(double*, double*, double*)
where:
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
Now, for example scalArr2DMult is a function that multiplies a 2D physical array by a scalar. (Division handled by just modifying scal outside of the function.):
void scalArr2DMult(double arr[], double scal, double arrOut[]){
for(int i = 0; i < nx; i++){
for( int j = 0; j < ny; j++){
arrOut[j + ny*i] = arr[j + ny*i] * scal;
}
}
}
Another example is the function Arr3DArr2DMult where it multiplies a 3D Fourier array by a 2D real valued array, hence, k indexing:
void Arr3DArr2DMult(double arr3D[][ncomp], double arr2D[], double arrOut[][ncomp]){
for(int i = 0; i < nx; i++){
for(int j = 0; j < nyk; j++){
for (int k = 0; k < ncomp; k++){
arrOut[j + nyk*i][k] = arr3D[j + nyk*i][k] * arr2D[j + nyk*i];
}
}
}
}
Where, nx, ny, nyk, ncomp are defined:
static const int nx = 256;
static const int ny = 256;
static const int nyk = ny/2 + 1;
static const int ncomp = 2;
I understand that there are more sophisticated ways of dealing with element-wise array multiplications such as using something like Eigen (I am mostly struggling with piecing the fttw3h library with Eigen and not sure why I am running into issues with that) or writing all matrices as instances of a Matrix<T> class (which I am currently also trying to test) but for the sake of simplicity is there an easier fix for a beginner like me to try first? Thanks.