cortexa7 CPU(s) took too long time to execute a loop compared to cortexa15 CPU(s)

Question

cortexa7 CPU(s) took too long time to execute a loop compared to cortexa15 CPU(s)

40 Views Asked by Thảo M. Hoàng At 17 August 2025 at 12:41

I am testing CPU performance. I used 02 boards with armv7 and SMP support: [email protected] dual core and cortexa7@1GHz dual core.

Then, execute a simple loop as below and measure time of execution:

#define DEFAULT_CALC_LOOPS 1000
#define LOOPS_MULTIPLIER 4.2
...
loops = DEFAULT_CALC_LOOPS;
...
void *calc(int loops)
{
    int i, j;
    for (i = 0; i < loops * LOOPS_MULTIPLIER; i++) {
        for (j = 0; j < 125; j++) {
            // Sum of the numbers up to J
            volatile int temp = j * (j + 1) / 2;
            (void)temp;
        }
    }
    return NULL;
}

The results showed on 02 boards after variety of tests:

cortexa15: ~1.2 ms
cortexa7: ~5 ms

There's a big difference between the above results.

Are there any dependence or limitation impacting to the results ? Who experienced with this can share me ideas ? Thanks.

Original Q&A

There are 1 best solutions below

**Thảo M. Hoàng** · Accepted Answer

For me, cortexa15 has over 2x - 3x performance compared to cortexa7. Besides, I am having [email protected] and cortexa7@1GHz. So I also think the above result is reasonable.

Below, I'll give an example for cortexa15 case study to measure execution time:

Formula to calculate CPU time:

CPU execution time = Instruction count x CPI x Clock cycle

I: Number of Instruction

CPI: cycles per instruction (IPC = 1/CPI)

C: Clock cycle (1/CPU clock) - second

Refer community: https://en.wikipedia.org/wiki/Instructions_per_second

Take a look to cortexa15 dual-core (same with iWave G1M/N).

Cortexa15 executes 9,900 MIPS at 1.5 GHz, average IPC = 6.6

CPI = 1/IPC = 1/6.6 = 0.1515 cycle/instruction

G1M/N have maximun 1.5 GHz ( range of clock ~1.3 GHz - 1.5 GHz) I assume the boards work with best effort (1.5 GHz)

C = 1/(1.5.10^9) = 0.6667 ns

Translate C code to assembly code for ARM arch:

for (i = 0; i < loops * LOOPS_MULTIPLIER; i++) {

    for (j = 0; j < 125; j++) {
        // Sum of the numbers up to J
        volatile int temp = j * (j + 1) / 2;
        (void)temp;
    }

}

Refer: https://godbolt.org

I = (((9+9) * 125) + 17) * 1000 * 4.2 = 9521400

The CPU execution time finally is 0.000962 seconds. Approximate 0.962 ms to execute the loop with the best effort of CPU.

In worst case (at 1.3 GHz), CPU time for the loop is around 1.109 ms.

Via testing, I got the same values.

--

I do more case for cortexa7@1GHz.

CPU execution time = 9521400 * 1/1.9 * 1ns = 5.011 (ms)

cortexa7 CPU(s) took too long time to execute a loop compared to cortexa15 CPU(s)

There are 1 best solutions below

Related Questions in CPU

Related Questions in CORTEX-A

Trending Questions

Popular # Hahtags

Popular Questions