Mapping between line:column numbers in C or C++ code before and after preprocessing

87 Views Asked by At

When there's a syntax error in a C or C++ source file, both GCC and Clang report the exact line and column of where the error is. For example, if we try to compile this code fragment:

#include <stdio.h>

static void f(char *arg, int ignored) {
        printf("%s\n", arg);
}

#define FUNCTION(x) f(x)

#define ARG "test"

int main() {
        FUNCTION(ARG);
        return 0;
}

— then GCC will report an error at 7:21 (the macro body):

a.c: In function ‘main’:
a.c:7:21: error: too few arguments to function ‘f’
    7 | #define FUNCTION(x) f(x)
      |                     ^
a.c:12:9: note: in expansion of macro ‘FUNCTION’
   12 |         FUNCTION(ARG);
      |         ^~~~~~~~
a.c:3:13: note: declared here
    3 | static void f(char *arg, int ignored) {
      |

while Clang will point at the macro call site (12:2):

a.c:12:2: error: too few arguments to function call, expected 2, have 1
        FUNCTION(ARG);
        ^~~~~~~~~~~~~
a.c:7:24: note: expanded from macro 'FUNCTION'
#define FUNCTION(x) f(x)
                    ~  ^
a.c:3:13: note: 'f' declared here
static void f(char *arg, int ignored) {

The output may differ slightly across compiler flavours and versions, but, more or less, GCC and Clang are largely consistent.

Now, as far as I understand, whenever a compiler driver such as gcc or clang++ is invoked, the preprocessor (gcc -E, or clang -E, or cpp) is run first, and the language-specific compiler front-end deals with an already preprocessed code.

If we look at the preprocessed code, however, it will indeed be heavily annotated, but there will be only line numbers from the original file, with column number information totally missing. Example:

# 885 "/usr/include/stdio.h" 3 4
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 909 "/usr/include/stdio.h" 3 4

# 2 "a.c" 2


# 3 "a.c"
static void f(char *arg, int ignored) {
 printf("%s\n", arg);
}

So, looking at the above, we can deduce that __uflow(...) is actually declared at /usr/include/stdio.h:885 (which is true), and the definition of f(...) starts at a.c:3 (which is also true). You can also see that the code is reformatted (the indentation level has changed). The format of the preprocessor output is described in more detail here.

Now, the question is: how exactly a compiler front-end performs location translation between the preprocessed and the original code when reporting feedback to the user, given that the preprocessor output apparently contains insufficient information?

How can a 3rd-party tool (e.g.: a static analyzer) do the same, provided it can only access the original and the preprocessed code (or, alternatively, the AST of the preprocessed code)?

Is it possible using the API of Libclang?

0

There are 0 best solutions below