Do padding bits need to be preserved?

231 Views Asked by At

The MSP430X architecture is an extension of the 16 bit MSP430 architecture to a 20 bit address space. This is done by expanding the processor's registers to 20 bit, keeping the least addressable unit at one octet (CHAR_BIT equals 8).

On this architecture, one could think of an implementation of the C programming language that provides a 20 bit integer type for int, using an 8 bit char, a 16 bit short and an emulated 32 bit long. Since 20 is not a multiple of CHAR_BIT, some padding bits are required when storing a variable of type int. For instance, one could store an int in four bytes, leaving one byte and four bits of another byte as padding.

After reading what the standard says about padding bits in integer types, I'm unsure of how they are supposed to behave. Since in this case the padding only exists for storage, their value can neither be set nor observed except by type punning. And even then, copying an object of this 20 bit type does not copy any padding bits. Is such a kind of padding bits allowed by ISO 9899:2011?

2

There are 2 best solutions below

1
Keith Thompson On BEST ANSWER

The C standard does not require padding bits to be copied by assignment. Assignment is specified in terms of values, not representations.

N1570 6.2.6.2p5 says:

The values of any padding bits are unspecified.

That's an unqualified statement, implying that they're unspecified in all circumstances, even after an assignment from an object that has some padding bits set.

By itself, that statement might be considered vague enough that it doesn't firmly establish that padding bits aren't necessarily copied.

Padding bits do not contribute to the representation of an integer object. A footnote on the quoted sentence says:

All other combinations of padding bits are alternative object representations of the value specified by the value bits.

(The "other" refers to trap representations.)

6.5.16.1p2, describing simple assignment, says:

In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.

The description is in terms of values not representations; there is no implication that the representation of the RHS must be maintained in the LHS object. And of course the RHS in an assignment can be an arbitrary expression, not just an object reference. Even if it is just the name of an object, it undergoes lvalue conversion, described in 6.3.2.1p2; this conversion refers only to the value of the object, not to its representation.

(Elsewhere, the standard says that parameter passing, function argument passing, and returning a value from a function behave like simple assignment.)

14
too honest for this site On

In general the standard places some constraints on the sizeof a type. Basic constraint is it has to be a multiple of char whith sizeof(char) defined as 1.

For padding bits within a type, refer to 6.2.6.1, which leaves the representation mostly implementation defined. 6.2.6.2p5 states that the value of padding bits is unspecified; there is no need to preserve, but there are two important constraints on the padding bits:

  1. A positive value in a signed integer shall represent the same value of the same unsigned type. This guarantees compatibility between signed and unsigned variants of the same type for positive values within the range of the signed variant.
  2. If all bits are zero, this represents the value 0. So all padding bits have to be 0, too. However, the reverse is not true (thanks to MattMcNabb).

Both include padding bits as they are part of the internal representation. From a more practical view, padding bits should be set to zero unless there are parity, etc. bits which depend on the other bits (yet the 2nd constraint has to be met).

That is a rough interpretation. For details, refer to the rest of cited sections.

On MSP430X, 20 bit int is of little practical use. They are mostly meant to extend the addressing range, not for integer arithmetics (although the instruction set apparently supports it - I was wrong here in a former edit).

Pointers have a sizeof 32 bits (4 8-bit-bytes), but only use 20 bits. Some embedded compilers might support special short/near/... qualifiers, effectively providing two different pointer sizes. This is - however - actually against the standard. (I'm a bit ambivalent here: optimization or portability).

MSP430X is one of the platforms where using the dedicated types from stdint.h (uintptr_t) and stddef.h (e.g. size_t) is essential, as casting a pointer to/from int will eventually fail. Even more, the standard's only requirements for (u)intptr_t (temporary storage, no operations) becomes clear. This way, there is no guarantee anything about the padding bits - even for the null pointer.

Reason for this large overhead (37.5% unused bits) is that the MSP430X has no functions to read/write 20 bit or even 24 bit values (and it would make array-indexing very costly) to/from memory. Only some constants can be 20 bits, as they are encoded in the instruction using an extension word which includes 4 bits and the remaining 16 bits as for other instructions follow the OP-code. This is likely one of the last (small) architectures to show how much additional effort has to be done for address space expansion while maintaining compatibility.

Note that the MSP430X has some additional pitfalls for 20 bit addressing modes. For instance, interrupt handlers` have to reside in the lower 64KiB, as the vector table only contains 16 bit entries. This actually prohibits the vetor table to be defines in C as an array of function pointers (as they cannot be freely converted to any other function pointer and back).