Fast saturating integer conversion?

618 Views Asked by At

I'm wondering if there is any fast bit twiddling trick to do a saturating conversion from a 64-bit unsigned value to a 32-bit unsigned value (it would be nice if it is generalized to other widths but that's the main width I care about). Most of the resources I've been able to find googling have been for saturating arithmetic operations.

A saturating conversion would take a 64-bit unsigned value, and return either the value unmodified as a 32-bit value or 2^32-1 if the input value is greater than 2^32-1. Note that this is not what the default C casting truncating behaviour does.

I can imagine doing something like:

  • Test if upper half has any bit set
  • If so create a 32-bit mask with all bits set, otherwise create a mask with all bits unset
  • Bitwise-or lower half with mask

But I don't know how to quickly generate the mask. I tried straightforward branching implementations in Godbolt to see if the compiler would generate a clever branchless implementation for me but no luck.

Implementation example here.

#include <stdint.h>
#include <limits.h>

// Type your code here, or load an example.
uint32_t square(uint64_t num) {
    return num > UINT32_MAX ? UINT32_MAX : num;
}

Edit: my mistake, issue was godbolt not set to use optimizations

2

There are 2 best solutions below

2
On BEST ANSWER

You don't need to do any fancy bit twiddling trick to do this. The following function should be enough for compilers to generate efficient code:

uint32_t saturate(uint64_t value) {
    return value > UINT32_MAX ? UINT32_MAX : value;
}

This contains a conditional statement, but most common CPUs, like AMD/Intel and Arm ones, have conditional move instructions. So they will test the value for overflowing 32 bits, and based on the test they will replace it with UINT32_MAX, otherwise leave it alone. For example, on 64-bit Arm processors this function will be compiled by GCC (to:

saturate:
  mov x1, 4294967295
  cmp x0, x1
  csel x0, x0, x1, ls
  ret

Note that you must enable compiler optimizations to get the above result.

0
On

A way to do this without relying on conditional moves is

((-(x >> 32)) | (x << 32)) >> 32