On a C compiler which uses ASCII as its character set, the value of the character literal '??<'
would be equivalent to that of '{'
, i.e. 0x7B. What would be the value of that literal on a compiler whose character set doesn't have a {
character?
Outside a string literal, a compiler could infer that ??<
is supposed to have the same meaning as an open-brace character is defined to have, even if the compiler character set doesn't have an open-brace character. Indeed, the whole purpose of trigraphs is to allow the use of sequences of representable characters to be used in place of characters that aren't representable. The spec requires that trigraphs even be processed within string literals, however, which has me puzzled. If a compiler's character set includes a {
character, the compiler can allow '{'
to be represented as '??<'
, but the character set includes {
I see no reason a programmer wouldn't simply use that. If the character set doesn't include {
, however, which would seem the only reason for using trigraphs in the first place, what representable character would a compiler be expected to replace ??<
with?
When it comes to considerations about the environment, especially to files, the C standard intentionally becomes rather vague. The following guarantees are made about trigraphs and the encoding of their corresponding characters:
C11 (n1570) 5.1.1.2 p1 (“Translation phases”) [emph. mine]
Thus, the trigraph sequence must be mapped to a single byte. This single-byte character must be in the basic character set different from any other character in the basic character set. How the compiler handles them internally during translation isn’t really observable behaviour, so it’s irrelevant.
If written to a text stream it may be converted (as I read it, maybe back to a trigraph sequence if the underlying encoding doesn’t have an encoding for a certain character). It can be read back again, and must compare equal if it is considered a printing character. Ibid. 7.21.2 p2:
Ibid. 7.4 p3:
And for binary streams, ibid. 7.21.2 p3:
In the comments above, the question arose if
always works for code generation and the output of that statement is guaranteed to be compilable. I couldn’t find a normative reference requiring
isprint('??<')
etc. (for(1)
) or evenisprint('<')
etc (for(2)
) to return non-zero, but the C89 rationale about streams says:When
'??<'
etc. is written to a binary stream, it must map to a single byte, be printed as such, be unique and distinguishable from any other basic character, and compare equal to'??<'
when read back.Related: C89 rationale about trigraphs.