Can a type in C have more than one object representation?

401 Views Asked by At

The C99 standard, section 6.2.6.1 8, states:

When an operator is applied to a value that has more than one object representation, which object representation is used shall not affect the value of the result (43). Where a value is stored in an object using a type that has more than one object representation for that value, it is unspecified which representation is used, but a trap representation shall not be generated.

I understood object to mean a location (bytes) in memory and value as the interpretation of those bytes based on the type used to access it. If so, then:

  1. How can a value have more than one object representation?
  2. How can a type have more than one object representation for a value?

The standard adds the below in the footnote:

enter image description here

Still, it's not clear to me. Can someone please simplify it for me and explain with examples?

4

There are 4 best solutions below

7
Michael Kenzel On BEST ANSWER

An object is a region of storage (memory) that can contain values of a certain type [C18 3.15].

An object representation are the Bytes that make up the contents of an object [C18 6.2.6.1].

Not every possible combination of Bytes in an object representation also has to correspond to a value of the type (an object representation that doesn't is called a trap representation [C18 3.19.4]).

And not all the Bits in an object representation have to participate in representing a value. Consider the following type:

struct A
{
    char c;
    int n;
};

Compilers are allowed to (and generally will) insert padding Bytes between the members c and n of this struct to ensure correct alignment of n. These padding Bytes are part of an object of type struct A. They are, thus, part of the object representation. But the values of these padding Bytes do not have any effect on the logical value of type A that is stored in the object.

Let's say we're on a target platform where Bytes consist of 8 Bits, an int consists of 4 Bytes in little endian order, and there are 3 padding Bytes between c and n to ensure that n starts at an offset that is a multiple of 4. The value (struct A){42, 1} may be stored in an object as

2A 00 00 00 01 00 00 00

But it may as well be stored in an object as

2A FF FF FF 01 00 00 00

or whatever else the padding Bytes may happen to be. Each of these sequences of Bytes is a valid object representation of the same logical value of type struct A.

This is also what the footnote is about. If you had two objects x and y that each contained a different object representation of the same value of type struct A, then x == y will evaluate to true while simply performing a memcmp() will not since memcmp() simply compares the bytes of the object representation without any consideration as to what the logical value stored in these objects actually is…

4
KamilCuk On

explain with examples?

For example on a compiler that to represent float type uses decimal floating point according to IEEE 754-2008 standard, assuming that stars are properly aligned - CHAR_BIT=8, sizeof(int)==4, floats have width of 32-bits with no padding bits and the compiler uses little endian, the following code (tested with gcc9.2 with -Dfloat=typeof(1.0df)):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
        float a, b;
        // simulate `a = 314` and `b = 314` with a compiler
        // that chose to use different object representation for the same value
        memcpy(&a, (int[1]){0x3280013a}, 4);
        memcpy(&b, (int[1]){0x32000c44}, 4);
        printf("a = %d, b = %d\n", (int)a, (int)b);
        printf("a %s b and memcpy(&a, &b) %s 0\n",
               a == b ? "==" : "!=",
               memcmp(&a, &b, sizeof(a)) == 0? "==" : "!=");
}

should (could) output:

a = 314, b = 314
a == b and memcpy(&a, &b) != 0
2
n. m. could be an AI On

A simple example of a value with more than one representation is an IEEE floating point zero. It has a "positive zero" and a "negative zero" representations.

Note An implementation that conforms to IEC 60559 must distinguish between positive and negative zeros, so in such an implementation they are different values rather than different representations of the same value. However an implementation doesn't need to conform to IEC 60559. Such implementations are allowed to e.g. always rerurn the same value for signbit of zero, even though the underlying hardware distinguishes +0 and -0.

On a sign-and-magnitude machine, integer zeros also have more than one representation.

On a segmented architecture like the 16-bit 8086, "long" pointers have more than one representation, for example 0x0000:0x0010 and 0x0001:0x0000 are two representations of the same pointer value.

Finally, in any data type with padding, padding bits do not influence the value. Examples include structs with padding holes.

5
chux - Reinstate Monica On

How can a value have more than one object representation?
How can a type have more than one object representation for a value?

Yes, by not having each bit pattern correspond to a different value.

Typically 1 bit pattern is preferred, canonical form, and others rarely generated by normal means.

  1. The x86 extended precision format contains bit patterns that are the same value of other bit patterns - even with the same sign. Research the "pseudo denormal" and "unnormal" bit patterns.

A side effect is that this 80-bit encoding does not realize 280 different values due to this redundancy. (even after accounting for not-a-numbers)

  1. Using 2 double to encode a long double has a similar impact.

Oversimplified example of 2 double representing a long double value:

1000001.0 + 0.0 (canonical form) same value as 1000000.0 + 1.0

  1. Decimal floating point has this issue too.

Because the significand is not normalized, most values with less than 16 significant digits have multiple possible representations; 1×102=0.1×103=0.01×104, etc.


As multiple bit patterns for the same value reduce the gamut of possible numbers, such encodings tend to fall out of favor compared to non-redundant ones. An effect is that we do not seem them as much these days.

A reason for their existence in the first place was to facilitate hardware realizations or simple easy to define (let's explicitly encode the most significant digit for our new FP format - using an implied one is so confusing).


@Eric brought up an interesting comment concerning value and operator that hinges on:

Where an operator is applied to a value that has more than one object representation, which object representation is used shall not affect the value of the result. C17 § 6.2.6.2 8

Given x = +0.0 and y = -0.0, which have the same numeric value of zero, would still qualify as having different values as the operator / distinguishes them as in 1.0/x != 1.0/y.

Still the various FP examples above have many other cases where x,y have different bit pattern yet the same value.