Shifting, types and sign extensions in C

445 Views Asked by At

I have the following code:

unsigned char chr = 234; // 1110 1010
unsigned long result = 0;
result = chr << 24;

And now result will equal 18446744073340452864, which is 1111 1111 1111 1111 1111 1111 1111 1111 1110 1010 0000 0000 0000 0000 0000 0000 in binary.

Why is there sign extension being done, when chr is unsigned?

Also if I change the shift from 24 to 8 then result is 59904 which is 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 1010 0000 0000 in binary. Why here is there no extension done here? (Any shift 23 or less doesn't have sign extension done to it)

Also on my current platform sizeof(long) is 8.

What are the rules for automatically casting to larger size types when shifting? It seems to me that if the shift is 23 or less than the chr gets casted to an unsigned type and if it's 24 or more it gets casted to a signed type? (And why is sign extension even being done at all with a left shift)

2

There are 2 best solutions below

0
On

With chr = 234, the expression chr << 24 is evaluated in isolation: chr is promoted to (a 32-bit signed) int and shifted left 24 bits, yielding a negative int value. When you assign to a 64-bit unsigned long, the sign-bit is propagated through the most significant 32 bits of the 64-bit value. Note that the method of calculating chr << 24 is not itself affected by what the value is assigned to.

When the shift is just 8 bits, the result is a positive (32-bit signed) integer, and that sign bit (0) is propagated through the most significant 32-bits of the unsigned long.

0
On

To understand this it's easiest to think in terms of values.

Each integral type has a fixed range of representable values. For example, unsigned char usually ranges from 0 to 255 ; other ranges are possible and you can find your compiler's choice by checking UCHAR_MAX in limits.h.

When doing a conversion between integral types; if the value is representable in the destination type, then the result of the conversion is that value. (This may be a different bit-pattern, e.g. sign extension).

If the value is not representable in the destination type then:

  • for signed destinations, the behaviour is implementation-defined (which may include raising a signal).
  • for unsigned destinations, the value is adjusted modulo the maximum value representable in the type, plus one.

Modern systems handle the signed out-of-range assignment by left-truncating excessive bits; and if it is still out-of-range then it retains the same bit-pattern, but the value changes to whatever value that bit-pattern represents in the destination type.


Moving onto your actual example.

In C, there is something called the integral promotions. With <<, this happens to the left-hand operand; with the arithmetic operators it happens to all operands. The effect of integral promotions is that any value of a type smaller than int is converted to the same value with type int.

Further, the definition of << 24 is multiplication by 2^24 (where this has the type of the promoted left operand), with undefined behaviour if this overflows. (Informally: shifting into the sign bit causes UB).

So, putting all the conversions explicitly, your code is

result = (unsigned long) ( ((int)chr) * 16777216 )

Now, the result of this calculation is 3925868544 , which if you are on a typical system with 32-bit int, is greater than INT_MAX which is 2147483647, so the behaviour is undefined.

If we want to explore results of this undefined behaviour on typical systems: what may happen is the same procedure I outlined earlier for out-of-range assignment. The bit-pattern of 3925868544 is of course 1110 1010 0000 0000 0000 0000 0000 0000. Treating this as the pattern of an int using 2's complement gives the int -369098752.

Finally we have the conversion of this value to unsigned long. -369098752 is out of range for unsigned long; and the rule for an unsigned destination is to adjust the value modulo ULONG_MAX+1. So the value you are seeing is 18446744073709551615 + 1 - 369098752.

If your intent was to do the calculation in unsigned long precision, you need to make one of the operands unsigned long; e.g. do ((unsigned long)chr) << 24. (Note: 24ul won't work, the type of the right-hand operand of << or >> does not affect the left-hand operand).