Branchless way to add two UINT while avoiding overflow?

165 Views Asked by At

I'm curious if there's a branchless way to do this, or perhaps just a generally better way:

uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
  if (UINT16_MAX - delta < num)
  {
    return UINT16_MAX;
  }

  return num + delta;
}

I tried thinking about it with min/max, but even if I try to use (well-defined) uint overflow, the best I can come up with is max(num + delta, num) which would return the original number a if overflown, or the result otherwise.

But I can't think of a way with min/max/clamp to get from this split to the desired behaviour (without branching)

4

There are 4 best solutions below

1
On

Stop fiddling around, keep it simple and rely on the optimizer.

Unless your optimizer is not up to sniff of course. Then by all means continue fiddling around.

0
On

Why not just the obvious

uint16_t add_no_over(uint16_t a, uint16_t b)
{
        uint32_t sum = a + b;
        return sum < 65535 ? sum : 65535;
}

It gives the following assembly code with gcc-11 for esp32 from godbolt:

        entry   sp, 32
        l32r    a8, .LC0
        extui   a3, a3, 0, 16
        extui   a2, a2, 0, 16
        add.n   a2, a2, a3
        minu    a2, a2, a8
        extui   a2, a2, 0, 16
        retw.n
0
On

Here's my attempt at it:

uint16_t add(uint16_t a, uint16_t b) {
    uint32_t i = a + b;
    return i & 0xffff | -(i >> 16);
}

A single addition can only overflow by at most one bit. So, if we add them as 32-bit numbers and they overflow, the upper 16 bits of the result will contain the value 1.

So, we shift that right 16 places, to get either 0 or 1. Then we negate it to get 0 or -1 (which converts to the maximum value as an unsigned). Then we or the result with the 16 bits produced by the addition. If it's 0, that won't affect the result. If it was -1 converted to unsigned, that'll have all the bits set, so when we or it with the previous result, we still get all bits set (which is the maximum value for an unsigned).

For the ESP32, gcc 11.2 produces the following:

add(unsigned short, unsigned short):
        entry   sp, 32
        extui   a2, a2, 0, 16
        extui   a3, a3, 0, 16
        add.n   a8, a2, a3
        extui   a2, a8, 16, 16
        neg     a2, a2
        or      a2, a2, a8
        extui   a2, a2, 0, 16
        retw.n

The only branch in sight is the return statement...

https://godbolt.org/z/ezhxY56qx

Of course, with a different compiler, or possibly even a different set of flags to the same compiler, it could generate a branch. But at least in a quick test on a couple dozen or so different compilers for half a dozen or so processors, I didn't see any branches generated (though compilation did simply fail on the 6502 compiler, which apparently doesn't support an unsigned long at all).

4
On

EDIT 2: I checked my specific platform (Xtensa-Esp32). Seems that all versions contain a branch, even simple boolean comparisons (a > b)

So in my case the simplest if statement in the title might be the best choice, even if the different methods are interesting on their own.

EDIT 1: Played with ChatGPT on this - after coaxing out the bugs maybe a better implementation?

uint16_t add_no_overflow(uint16_t num, uint16_t delta) {
    uint16_t sum = num + delta;
    uint16_t did_overflow = (sum > num)-1; // 0xFFFF if overflown, 0x0000 otherwise
    return (sum & ~did_overflow) | (UINT16_MAX & did_overflow);
}

Thought about this immediately after posting. The magic of rubber ducking!

Implementation with min/max (not necessarily branchless)

uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
  uint16_t final_delta = min(UINT_16_T_MAX-num, delta);
  return num + final_delta;
}