Why is the 'auto' keyword useful for compiler writers in C?

5.7k Views Asked by At

I'm currently reading "Expert C Programming - Deep C Secrets", and just came across this:

The storage class specifier auto is never needed. It is mostly meaningful to a compiler-writer making an entry in a symbol table — it says "this storage is automatically allocated on entering the block" (as opposed to statically allocated at compiletime, or dynamically allocated on the heap). auto is pretty much meaningless to all other programmers, since it can only be used inside a function, but data declarations in a function have this attribute by default.

I saw that someone asked about the same thing here, but they don't have any answer and the link given in comments only explains why there's such a keyword in C, inherited from B, and the differences with C++11 or pre-C++11.

I'm posting anyway to focus on the part stating that the auto keyword is somehow useful in compiler writing, but what is the idea nor the connection with a symbol table?

I really insist on the fact that I ask only about a potential usage when programming a compiler in C (not coding a C compiler).

To clarify, I asked this question because I'd like to know if there's an example of code where auto can be justified, because the author stated there would be, when writing compilers.

Here the whole point is that I think to have understood auto (inherited from B, where it was mandatory, but useless in C), but I can't imagine any example when using it is useful (or at least not useless).

It really seems that there isn't any reason at all to use auto, but is there any old source code or something like that corresponding to the quoted statements?

5

There are 5 best solutions below

5
On BEST ANSWER

Author answer: I just emailed Mr Van der Linden, and here is what he said:

Yes, I agree with the people who answered on stack overflow. I don't know for certain, because I never used the language B, but it seems highly plausible to me that "auto" ended up in C because it was in B.

Even when I was professionally kernel and compiler programming in C in the 1980's, I never saw any code that I can recall that used "auto".

The key takeaway is that the auto keyword doesn't add any extra information, and thus is redundant and unneeded. It was a mistake to bring it into C!

I also asked for some explanation about what he meant by speaking about compiler writing and symbol table. Here is his response:

Say you are writing a compiler that will translate C source code into linker objects (object files that can be linked).

Whenever your lexer (front end of the compiler) finds a sequence of characters that form a user-defined symbol (might be a variable, might be a function name, might be a constant, etc), the compiler will store that name in a table called the "symbol table". It will also store everything else it knows about the symbol - if it is a variable, it will store its type, if a constant it will store the value, if a function it will note that it can be invoked, etc etc. It will also store the scope of the name (the lines of code in which this symbol is known). The symbol table is one of the core data structures of a compiler, and some of it is carried forward into the object file. The object file needs to know any names that are to be addressable by external code objects, so the linker can associate them the use of a name with the object in which it is stored.

Then later, when the compiler comes across the same name, the compiler looks in the symbol table to see if it knows all about the name already. One of the useful items to store about a name is "where the compiler will allocate storage for it". That storage has to be maintained as long as the symbol remains in scope. So it is useful for the symbol table to know where it should allocate the storage at runtime. I gave 3 examples of different places where a variable might be stored. The "auto" keyword tells the compiler "this is a variable, and you should store this on the stack and its scope is the function it is declared in".

Only, the compiler doesn't need to be told this, because this is already true for all variables declared within a function. I hope this explanation makes sense.

I guess I completely misunderstood his statements by thinking that auto may have some usages when writing a compiler in C, in the code dealing with symbol table, but it seems that he meant auto is useless, but C compiler writers must handle it and understand it. I nevertheless asked him to confirm my mistake, and it was indeed a misunderstanding of mine :

Perhaps the best way to think about this is:

  1. "auto" has no semantic effect in C
  2. we think it came over from B, but don't know for sure.
  3. It conveys info to someone writing a compiler for C code.
  4. But that info is a duplicate of other info that the compile writer has.
  5. So a compiler writer can take note of either piece of info to update the symbol table
  6. Or indeed, they can check that the two pieces of info are consistent, and if not, issue an error message.
15
On

As far as I can tell from 40+ years of C programming, including compiler work, the auto keyword has been completely useless in C for 50 years.

To answer your precise question, Why is auto keyword useful for compiler-writers in C? It isn't useful at all; C compiler writers are just required to parse it as a keyword and implement its semantics as a storage class specifier.

It seems to be a left over from B, the predecessor to the C language, developed by Ken Thompson and Dennis Ritchie at Bell Labs in the late sixties and early seventies. I have never used B and I doubt Peter, whom I met in 1984 at Inria, has either.

Before C23, auto can only be used to specify automatic storage class for definitions in the scope of a function. This is the default, so auto is fully redundant and as long as the type or another qualifier is specified, auto can be removed. There isn't any case where it was needed, so its inclusion in the C Standard is only rooted in the early history of the C language.

auto has been used in C++ since C++11 to enable type inference in variable definitions, with or without automatic storage, where the compiler detects the type from that of the initializer.

With the current trend pushing for convergence on a common subset for the C and C++ languages, new semantics have been attached to this keyword in C23 modelled after the C++ semantics, but more restricted:

6.7.1 Storage-class specifiers

auto may appear with all the others except typedef;

auto shall only appear in the declaration specifiers of an identifier with file scope or along with other storage class specifiers if the type is to be inferred from an initializer.

If auto appears with another storage-class specifier, or if it appears in a declaration at file scope, it is ignored for the purposes of determining a storage duration of linkage. It then only indicates that the declared type may be inferred.

Type inference is specified as:

6.7.9 Type inference

Constraints

1 A declaration for which the type is inferred shall contain the storage-class specifier auto.

Description

2 For such a declaration that is the definition of an object the init-declarator shall have one of the forms

direct-declarator = assignment-expression
direct-declarator = { assignment-expression }
direct-declarator = { assignment-expression , }

The declared type is the type of the assignment expression after lvalue, array to pointer or function to pointer conversion, additionally qualified by qualifiers and amended by attributes as they appear in the declaration specifiers, if any. If the direct declarator is not of the form identifier attribute-specifier-sequenceopt, possibly enclosed in balanced pairs of parentheses, the behavior is undefined.

Type inference is very useful in C++ because types can be very complex and almost impossible to specify in variable definitions, especially with templates. Conversely, using it in C is probably counter productive, lessening code readability and encouraging laziness and error prone practices. It was already bad enough to hide pointers behind typedefs, now you can hide them completely with the auto keyword.


To finish on a less serious note, I remember seeing it used in tricky interview tests, where the candidate is asked to find why this code does not compile:

#include <stdio.h>
#include <string.h>

int main(void) {
    char word[80];
    int auto = 0;
    while (scanf("%79s", word) == 1) {
        if (!strcmp(word, "car")
        ||  !strcmp(word, "auto")
        ||  !strcmp(word, "automobile"))
            auto++;
    }
    printf("cars: %d\n", auto);
    return 0;
}
0
On

The auto keyword originates from the B language, where it was actually very useful, and allowed compiler to distinguish local names from non-local names (marked with extrn keyword):

main()
{
    extrn printf;
    auto x;
    x = 25;
    printf('%d', x);
}

When the B language evolved into C, it preserved a high degree of backward compatibility. In B there was basically only a single "cell" type, so in C they've introduced type annotations as an optional feature. In C89 and prior, auto had been used for the same purpose of introducing local names:

main()
{
    extern printf();
    auto x; /* type is int by default */
    x = 42;
    printf("%d", x);
}

online compiler

After language focus shifted towards enforcing type safety, the need for the auto specifier evaporated completely, since presence of type annotation allowed to distinguish local name declarations.

5
On

First of all auto is one of 4 or 5 Storage-class specifiers: auto, register, static, extern, and from C11 on _Thread_local. Every variable in C has one associated storage-class specifier from the above list, with auto being the default if not specified.

From a user's perspective, due to auto being the default, it is rarely1 necessary to specify it, and arguably doing so is just noise -- the other specifiers stand out more if no specifier is generally used.

From a compiler writer's perspective, however, since every variable has a storage-class specifier, the concept of auto is paramount, and putting yourself in their shoes, you can imagine that somewhere exists an enum enumerating the 4 (or 5) different specifiers and each variable declaration having one of the enum values attached.

The fact that it appears in the compiler does not require that it appears in the language, but it does provide an argument for it: regularity. The concept exists regardless of whether it's directly exposed (or not) and there is little cost in exposing it, so might as well, no?

1 @BenVoigt mentioned that it may be useful in macros, where the type is user-provided, as it prevents the user from specifying another storage specifier such as static, since the compiler will not accept two storage specifiers.

1
On

The auto keyword in C is not very useful to most programmers. However, it can be useful to compiler writers.

The symbol table is a data structure that the compiler uses to keep track of all the variables and functions in a program. When the compiler sees an auto declaration, it knows that the variable will be allocated on the stack. This means that the compiler can optimize the code for that variable, such as by avoiding storing it in a register.

For example, consider the following function:

void soso(int x) {
  int y = x * 2;
  // The compiler could optimize this code if it knew that y was allocated on the stack.
  int z = y + 3;
}

If the compiler knew that y was allocated on the stack, it could avoid storing y in a register. This would save memory and improve the performance of the function.

Of course, the auto keyword is not always necessary to improve the performance of compiler-generated code. However, it can be a useful tool for compiler writers who want to optimize their code.

Here are some additional details about the auto keyword:

The auto keyword is not necessary in C. The compiler will automatically assume that any variable declared inside a function is allocated on the stack. The auto keyword can be used to declare variables outside of functions. However, this is not recommended, as it can make the code more difficult to read and understand. The auto keyword is not available in all C compilers. Some compilers may only support it in certain situations.