What is the difference between float, _Float32, _Float32x, and _Float32_t?

921 Views Asked by At

C23 introduced a number of floating point types, including but not limited to:

  • _Float32
  • _Float32x
  • _Float32_t

I am unsure of the differences, such as:

  • Are they keywords, or are they type aliases, or something else?
  • Are they distinct types, or can they be aliases for float?
  • What is the minimum range and precision of these types?
  • Are they required to be IEEE-754-compliant (or IEC 60559)?
  • Is float obsoleted by _Float32 or other types?

The same questions apply to _Float64 vs double, and _Float128 vs long double.

1

There are 1 best solutions below

0
On BEST ANSWER

Only _FloatN_t types (e.g. _Float32_t) are aliases from the <math.h> header. All the other types are required to be distinct, and their names are keywords. (See H.5.1 [Keywords])

All of the types fall into one of four categories (see below). Choose between them as follows:

  • float, double, and long double, if you are satisfied with the very lenient requirements of these types
    • alternatively, check whether __STDC_IEC_60559_BFP__ is defined, which makes them stricter
    • also, use float and double if you are okay with them being the same type1)
    • also, you must use these types for compatibility with pre-C23 compilers
  • _FloatN if you need a specific IEC 60559 type with exactly N bits
  • _FloatNx if you need an extended IEC 60559 type with minimum N precision
    • especially if you want to store N-bit integers in a floating-point number with no loss
  • _FloatN_t if you don't need IEC 60559 types, and you are not satisfied with the minimum requirements for float and double

1) On architectures without a double-precision FPU, float and double might be the same size (e.g. Arduino). Use other types (e.g. _Float64_t over double) if you want software emulation of double-precision instead.

Standard floating types

float, double, and long double are collectively called standard floating types. Their representation is implementation-defined, but there are some requirements nonetheless:

  • double must be able to represent any float, and long double must represent any double
  • if __STDC_IEC_60559_BFP__ is defined, float and double are represented like _Float32 and _Float64
  • they must be able to represent some amount of decimal digits with no loss, and have a minimum/maximum value
Type Minimum Decimal Digits Minimum Maximum
float FLT_DECIMAL_DIG ≥ 6 FLT_MIN ≤ 10-37 FLT_MAX ≥ 1037
double DBL_DECIMAL_DIG ≥ 10 DBL_MIN ≤ 10-37 DBL_MAX ≥ 1037
long double LDBL_DECIMAL_DIG ≥ 10 LDBL_MIN ≤ 10-37 LDBL_MAX ≥ 1037

Usually, float and double are binary32 and binary64 types respectively, and long double is binary128, an x87 80-bit extended floating-point number, or represented same as double.

See C23 Standard - E [Implementation limits]

Interchange floating types

_Float32, _Float64 etc. are so called interchange floating types. Their representation must follow the IEC 60559 interchange format for binary floating-point numbers, such as binary32, binary64, etc. Any _FloatN types must be exactly N bits wide.

The types _Float32 and _Float64 might not exist, unless the implementation defines __STDC_IEC_60559_BFP__ and __STDC_IEC_60559_TYPES__. If so:

  • _Float32 exists, and float has the same size and alignment as it (but is a distinct type)
  • _Float64 exists, and double has the same size and alignment as it (but is a distinct type)
  • a wider _FloatN (typically _Float128) exists if long double is a binaryN type with N > 64

See C23 Standard - H.2.1 [Interchange floating types].

Extended floating types

_Float32x, _Float64x, etc. are so called extended floating types (named after IEC 60559 extended precision). Unlike their interchange counterparts, they only have minimum requirements for their representation, not exact requirements. A _FloatNx must have ≥ N bits of precision, making it able to represent N-bit integers with no loss.

These types might not exist, unless the implementation defines __STDC_IEC_60559_TYPES__. If so:

  • _Float32x exists if __STDC_IEC_60559_BFP__ is defined, and may have the same format as double (but is a distinct type)
  • _Float64x exists if __STDC_IEC_60559_DFP__ is defined, and may have the same format as long double (but is a distinct type)
  • in either case, _Float128x optionally exists

The extra precision and range often mitigate round-off error and eliminate overflow and underflow in intermediate computations.

See C23 Standard - H.2.3 [Extended floating types]

Aliases

_Float32_t, _Float64_t, etc. are aliases for other floating types, so that:

  • _FloatN_t has at least the range and precision of the corresponding real floating type (e.g. _Float32_t has the at least the range and precision of _Float32 if it exists)
  • a wider type can represent all values of a narrower one (e.g. _Float64_t can represent _Float32_t)

See C23 Standard - H.11 [Mathematics <math.h>].