Examples when space between C++ or C operators must not be removed

1.2k Views Asked by At

I'm trying to come up with all examples when removing whitespace between operators in valid C or C++ code changes its meaning (by changing it to code which does something else or which doesn't compile).

I enumerated http://en.wikipedia.org/wiki/Operators_in_C_and_C++ and I could come up with:

+ +:

int f(int a) { return a + /**/ +5; }

- -:

int f(int a) { return a - /**/ -5; }

I was trying 1& &p as well, but I couldn't make it compile with a space, I always got type errors.

I'm looking for answers in the following format: a valid C or C++ code snippet with /**/ and spaces between two operators (see two examples above). The removal of /**/ and the surrounding whitespace must produce a compile error or change the meaning of the compiled program.

Some context why I need it: I'm writing a C and C++ source translator which removes unnecessary whitespace. To make it correct, I need to understand when whitespace can't be removed without changing the meaning of the code.

4

There are 4 best solutions below

4
On

Here's an example to serve as a sort of proof that your examples are not the only ones. But surely there are plenty more that can be created, so I won't try to enumerate them.

#include <cassert>

struct C {
    operator bool() const { return true; }
};

bool operator&(C& l, C* r) { return false; }

int main()
{
    C a, b;
    assert(a &&b); // or put space between &&
}
0
On

I think the answer to your question is "no, there are no further cases" where code that compiles changes meaning because of a space within the characters of an operator, but I would have to carefully enumerate all operators and use cases before saying I'm sure.

One thing is for sure, the comments about how easy this question is, are best ignored.

2
On

In order to fully answer or understand this question, you need to understand how C++ tokenizing and parsing works.

Here's what's required for this to work. Some multicharacter token, which can be cut into smaller tokens, after that, you just have to come up with a good way to figure out how to make it compile.

Let's start at the beginning, trigraph sequences:

Table 1 — Trigraph sequences
??= #    ??( [    ??< {
??/ \    ??) ]    ??> }
??' ˆ    ??! |    ??- ∼

The question mark, also doubles as the conditional operator, cannot normally be doubled up in the language. However, there are a few places where a doubled question mark can be valid, inside of string literal, for instance, or inside of the d-char-sequence of a raw string literal. So we can actually construct an example where removing part of a trigraph completely alters how a line is parsed.

#include <iostream>

int main() {
  const char* asdf  = "?? /"; std::cout << R";//"(content);//"" << std::endl;
  const char* asdf2 = "??/"; std::cout << R";//"(content);//"" << std::endl;
  std::cout << asdf  << std::endl;
  std::cout << asdf2 << std::endl;

  return 0;
}

Unfortunately, because there's so few places you can shoehorn in a ?? or a ? ? the possibilities are is a bit limited. (Which is one of the reasons why ?? sequences were chosen for the trigraph, there is very little valid C++ code that features them, and legacy is important to the committee.) I could go on from here. Trigraphs are probably the most challenging example. But I'll leave the rest as an exercise to the reader.

0
On

In C++ there is (or was) the notorious problem of <> being comparison operator and template brackets. There are (or have been) cases where whitespace changes the semantic, but doesn't necessarily lead to a compile time error. Examples for that are quite involved, I once have written up such a case here:

template< int len > int fun(int x);
typedef int (*fun_t)(int);
template< fun_t f > int fon(int x);

void total(void) {
   int A = fon< fun< 9 > >(1) >>(2);
   int B = fon< fun< 9 >>(1) > >(2);
}

For A we take the function fon that depends on function pointer fun<9> and call that with argument 1. The result is then shifted by 2 to the right.

In contrast, for B we call the function fon that depends on function pointer fun<5> and pass it the argument 2. (The 5 is the result of shifting 9 by one to the right.)

So just because of some blanks are spread differently the result can be completely different.

All of this might have changed with C++11, where the rules for template parameter <> have been modified.