Bitwise saturated addition in C (HW)

10.4k Views Asked by At

I'm working on an assignment and I can't figure out how to implement this. I have to make a function sadd(int x, int y) that returns the numbers added together unless it overflows (then just return the max possible int). I've been able to come up with some solutions involving casting and conditional statements, but those aren't allowed in the solution. Only the operators ~ ! ^ + << >> & and |.

2

There are 2 best solutions below

5
On

For addition of signed numbers, overflow has happened if you add two numbers of the same sign and get a result with a different sign. Because of the ranges involved, it is impossible to generate overflow when adding two numbers of different signs.

So, what you can do is — watching the sign bit only (the most significant one in two's complement) — use exclusive OR to get to whether the two original numbers differed in sign, complement that so that you've got '0' if they were different, '1' for the same.

You can then use exclusive OR on the result versus one of the inputs. That'll give '0' if they were the same, '1' if they were different.

And those two results together to get an overall '1' if the two inputs were the same but the result was different, '0' otherwise.

You can then use a combination of shifts and ORs to fill an entire integer with that value. Supposing you're in a 32 bit integer, just set the lowest 31 bits to get the highest value positive integer. What you can then do is a similar sets of shifts and ORs on the sign bit of either of the inputs. Exclusive OR the results. That'll instead give the lowest value integer if the inputs were negative.

EDIT: oh, and use the bit value of whether there was overflow, extended out to fill the int, to select what value to return by anding it with the result you would return if there were overflow, complementing it and anding it with the normal additive result, then oring (or adding) the two together.

Presto: all binary logic, no conditionals. I assume, because it's homework, that you don't want actual code?


Nine years later, I agree with the comment of @gman below; in order to implement saturating addition using only the permitted operators, you've got to rely on undefined behaviour — and the answer above implicitly does so.

A substantial risk with that is that compilers know which behaviour is undefined and may exploit that during optimisation. Knowledge of the underlying architecture (e.g. that it is two's complement, that it does signed shifts) is not sufficient to predict a compiler's output.

A robust production implementation is possible, but would require conditional statements and therefore would not answer this question.

1
On

From the ARM Website (ARM Info Center) on intrinsic functions:

4.1 Compiler intrinsics

Compiler intrinsics are functions provided by the compiler. They enable you to easily incorporate domain-specific operations in C and C++ source code without resorting to complex implementations in assembly language.
The C and C++ languages are suited to a wide variety of tasks but they do not provide in-built support for specific areas of application, for example, Digital Signal Processing (DSP). Within a given application domain, there is usually a range of domain-specific operations that have to be performed frequently. However, often these operations cannot be efficiently implemented in C or C++. A typical example is the saturated add of two 32-bit signed two’s complement integers, commonly used in DSP programming. The following example shows a C implementation of saturated add operation

#include <limits.h>
int L_add(const int a, const int b)
{
    int c;
    c = a + b;         // Needs to be   c = a + (unsigned)b; to avoid UB
    if (((a ^ b) & INT_MIN) == 0)
    {
        if ((c ^ a) & INT_MIN)
        {
            c = (a < 0) ? INT_MIN : INT_MAX;
        }
    }
    return c;
}

This example is unsafe in ISO C; signed integer overflow is undefined behaviour so you need to avoid causing it as part of detecting it. c = a + (unsigned)b; would avoid the problem, giving the expected binary integer result because 2's complement addition is the same as unsigned binary, but in C unsigned types have well-defined wrapping.