Why my C code is slowed down if I am using += operator

Question

Why my C code is slowed down if I am using += operator

114 Views Asked by Stéphane Guillaso At 22 December 2023 at 12:23

Edit I remove some part of the question because it might be confusing > and thanks of answers I should be able to found out by myself.

I am trying to write a program that resample a signal using a spectral function. I am writing the code in C but I discover the problem coding in C++.

Here is a portion of the code I am using. I have a function test() that call another function test3(tmp) passing an array as pointer. As long as I combine the += operator and I am trying to fill the array of the first function, my code is slowing down.

I initialize and calculate using zeros but all arrays must have double values.

#include <stdio.h>
#include <stdlib.h>

void test3(double *tmp);

void test() {
    double *tmp;
    int k;
    tmp = (double *) calloc(250 * 6, sizeof(double));
    for (k = 0; k < 1000; ++k)
        test3(tmp);
    free(tmp);
}

void test3(double *tmp) {
    int k;
    double *fx;
    double *srf;
    double *ptr;
    int p;
    int l;
    int c;
    double a;
    fx = (double *) calloc(2100 * 6, sizeof(double));
    srf = (double *) calloc(2100 * 250, sizeof(double));
    
    for (int k = 0; k < 2100 * 6; ++k)   fx[k] = 0;
    for (int k = 0; k < 2100 * 250; ++k) srf[k] = 0;
    
    ptr = (double *) calloc(250 * 6, sizeof(double));
    
    for (p = 0; p < 6; ++p) {
        for (l = 0; l < 250; ++l) {
            ptr[l + p*250] = 0.0;
            for (c = 0; c < 2100; ++c)
                ptr[l+p*250] += fx[c+p*2100] * srf[c+l*2100];
        }
    }
    for (k = 0; k < 250 * 6; ++k) tmp[k] = ptr[k];
    
    free(ptr);
    free(fx);
    free(srf);
}

int main(int argc, char *argv[]) {
    test();
    return 0;
}

I compile using:

gcc -O2 -o test test.c -lm

When I run the code time ./test I get

./test_lut  3.84s user 0.11s system 98% cpu 4.008 total

Original Q&A

There are 2 best solutions below

**chqrlie** · Answer 1 · 2023-12-22T13:13:39.023000

The compiler does a good job at optimizing out useless code, but not always:

if you remove the line for (k = 0; k < 250*6; ++k) tp[k] = ptr[k]; you essentially remove all side effects of the function test3(): you initialize arrays pointed to by fx and srf and compute the entries in the one pointed to by ptr, then free these arrays. The compiler seems able to determine that no object is impacted that survives after returning from the function and removes the code altogether, resulting in almost no time spent in the function.
if you change += to =, the loop
```
          for (c = 0; c < 2100; ++c)
              ptr[l+p*250] += fx[c+p*2100] * srf[c+l*2100];
```
keep modifying the same location, hence only the last iteration is useful so the code is reduced to ptr[l+p*250] += fx[2099+p*2100] * srf[2099+l*2100]; resulting in much faster operation.
the third attempt using a temporary variable a has no impact as the compiler probably already optimizes the constant destination out of the loop.

You can investigate the compiler work using Godbolt's Compiler Explorer and play with the code and compiler flags.

As can be seen analyzing the code generated, neither gcc not clang generate any code for the initialization loops (setting memory allocated with calloc() to all bit zero value 0.0 has no effect) and convert the final for loop to a call to memcpy. Commenting this final for loop causes both compilers to generate no code and changing the += to = simplifies the inner loop code.

Interestingly, albeit both compilers correctly determine that fx and srf point to arrays with all elements equal to 0.0, they do not assume that the contents of these arrays stay null for the whole computation and thus all results in the destination array should be null too as ptr[l+p*250] += fx[c+p*2100] * srf[c+l*2100]; simplifies to ptr[l+p*250] += 0.0;

If you compile the code as C (not C++) and define fx, srf and ptr as

    double * restrict fx;
    double * restrict srf; 
    double * restrict ptr;

Both compilers should determine that the arrays fx and srf cannot be modified as a side effect of writing via ptr, hence remain null. The code generated would be drastically simplified and should execute almost instantly, but this optimisation does not seem to be effective.

**Eric Postpischil** · Answer 2 · 2023-12-22T13:16:16.040000

When for (k = 0; k < 250*6; ++k) tmp[k] = ptr[k]; is removed, test3 has no side effects (produces no results for the program), and the compiler optimizes main to the two instructions xor eax, eax and ret.

When += is changed to =, the value left in ptr[l+p*250] by ptr[l+p*250] = fx[c+p*2100] * srf[c+l*2100]; is only that of the last iteration of for (c = 0; c < 2100; ++c), so the compiler is able to replace the loop with a single iteration in which c is 2099.

Why my C code is slowed down if I am using += operator

There are 2 best solutions below

Related Questions in C

Related Questions in PERFORMANCE

Related Questions in OPERATOR-KEYWORD

Trending Questions

Popular # Hahtags

Popular Questions