I need a very precise timing, so I wrote some assembly code (for ARM M0+). However, the timing is not what I expected when measuring on an oscilliscope.
#define LOOP_INSTRS_CNT 4 // subs: 1, cmp: 1, bne: 2 (when branching)
#define FREQ_MHZ (BOARD_BOOTCLOCKRUN_CORE_CLOCK / 1000000)
#define DELAY_US_TO_CYCLES(t_us) ((t_us * FREQ_MHZ + LOOP_INSTRS_CNT / 2) / LOOP_INSTRS_CNT)
static inline __attribute__((always_inline)) void timing_delayCycles(uint32_t loopCnt)
{
// note: not all instructions take one cycle, so in total we have 4 cycles in the loop, except for the last iteration.
__asm volatile(
".syntax unified \t\n" /* we need unified to use subs (not for sub, though) */
"0: \t\n"
"subs %[cyc], #1 \t\n" /* assume cycles > 0 */
"cmp %[cyc], #0 \t\n"
"bne.n 0b\t\n" /* this instruction costs 2 cycles when branching! */
: [cyc]"+r" (loopCnt) /* actually input, but we need a temporary register, so we use a dummy output so we can also write to the input register */
: /* input specified in output */
: /* no clobbers */
);
}
// delay test
#define WAIT_TEST_US 100
gpio_clear(PIN1);
timing_delayCycles(DELAY_US_TO_CYCLES(WAIT_TEST_US));
gpio_set(PIN1);
So pretty basic stuff. However, the delay (measured by setting a GPIO pin low, looping, then setting high again) timing is consistently 50% higher than expected. I tried for low values (1 us giving 1.56 us), up to 500 ms giving 750 ms.
I tried to single step, and the loop really does only the 3 steps: subs (1), cmp (1), branch (2). Paranthesis is number of expected clock cycles.
Can anybody shed light on what is going on here?
After some good suggestions I found the issue can be resolved in two ways:
Note: If anybody copies the above code, note that you can delete the cmp, since the subs has the s flag set. If doing so, remember to set instruction count to 3 instead of 4. This will give you a better time resolution.