Arm Mali T-624 STUCK EXECUTION TIME IN 12666 ms

100 Views Asked by marios At 20 November 2021 at 16:45

I am using this GPU for my thesis in the university

I am running a lot of different kernels on this thing and the execution time is stuck at 12666.6689 ms, even though I had a loop with 88 instructions * 100m iterations.

__kernel void scalar_mult_add(__global  int * list)
{
    unsigned int x=38;
    unsigned int y=38;
    for(int i=0; i<1000000  ; i++){
        y=x*y;
        x=x+y;
    }
}

The only thing that can make the execution time get increase is adding x!=0 inside the for loop statements

__kernel void scalar_mult_add(__global  int * list)
{
    unsigned int x=38;
    unsigned int y=38;
    for(int i=0; i<1000000  && x!=0 ; i++){
        y=x*y;
        x=x+y;
    }
}

Why does this thing happens all the time? I can't understand: e.g. 88 million instructions have the same execution time as 1 million instructions, even though I don't have that much units to execute such big kernel at the same time like 1 million instructions.

Why does adding a single x!=0 statement in the loop makes the execution-time increased that much and couple of additions inside the for loop do not?

Original Q&A

There are 1 best solutions below

solidpixel On 22 November 2021 at 07:57

Why does adding a single x!=0 statement in the loop make the execution-time increase?

In the original case the loop doesn't do anything - the output isn't kept and the loop result is not used in any further computation. As is noted in the comments above, the compiler is probably just optimizing out the loop.

Adding the x != 0 to the loop condition check means that the loop result is "used" - you need the previous iteration loop result to determine if you keep iterating. The code is still pointless (no output) but the compiler doesn't see this as dead-code, so it stays in.

However, note that Mali-T624 has SIMD vector unit. Writing a dependent scalar loop like this means that you are effectively killing any auto-vectorization in the compiler. Highly recommend using vec4 data types for the computation.

If you want some basic static analysis you might want to look at the Mali Offline Compiler, which is freely downloadable in Arm Mobile Studio. Note that compiling OpenCL kernels requires macOS or Linux, but if you are on Windows you can run the Linux binary under WSL.

Arm Mali T-624 STUCK EXECUTION TIME IN 12666 ms

There are 1 best solutions below

Related Questions in C++

Related Questions in ARM

Related Questions in OPENCL

Related Questions in MALI

Trending Questions

Popular # Hahtags

Popular Questions