Does the runtime dereference of a nullptr always result in Segmentation Fault?

130 Views Asked by At

tl;dr

What happens if "at runtime" a pointer p such that p == nullptr is dereferenced and its "pointee" is read from/written to?

Does that imply 100% a segmentation fault because the memory address 0 can't just be read from/written to?


The reason why I wrote "at runtime", is that I'm aware that the compiler will assume that p != nullptr when it sees *p (with p raw pointer), so some *p can just disappear due to UB, and effectively at runtime no nullptr is dereferenced at that point and the program keeps going though the alleys of undefined behavior.

For instance, given this TU,

int* bar();

int foo(int* i) {
    return *i;
}

int work() {
    int* b = bar();
    int f = foo(b);
    return b != nullptr ? 0 : (f + *bar());
}

the compiler can assume bar() =! nullptr just by seeing that its return value is dereferenced by foo's body, which can only be done if bar() != nullptr. Indeed the function work is compiled down to

work():
        sub     rsp, 8
        call    bar()
        xor     eax, eax
        add     rsp, 8
        ret

where the value returned by bar() is never actually derenferenced, even if *bar() is in the source code.

But foo (imagine it alone in its own TU, so there's no UB going on) is compiled down to

foo(int*):
        mov     eax, DWORD PTR [rdi]
        ret

so if this TU is linked against another one that calls foo(nullptr), then the mov instruction will definitely try to read from the address in rdi which is whatever nullptr is compiled to, I guess 0.

Does that mean that Segmentation Fault is unavoidable? (I'm not saying I want to avoid segv, quite the opposite, I'm just wondering if things can go so wrong that there's no segv and the program keeps going who knows where!)


(¹) I'm aware that the issue of whether dereferencing a null pointer is UB is still unresolved, but this is not a duplicate of my previous question.

2

There are 2 best solutions below

0
ComicSansMS On BEST ANSWER

You are mixing several concerns into one question here which amplifies the confusion in this matter. You should take care to distinguish:

  • Your C++ program - Undefined behavior is undefined. Anything could happen. That is all that the standard says in this matter, and as your question is tagged C++, there is nothing more to say from a C++ perspective.
  • The executed machine code - The compiler will most likely still produce some machine code for your attempted dereference operation. You won't be able to tell what machine code you will get beforehand, as the compiler can do whatever it wants due to the UB at play. But I understand your question in the way that you are observing the generated machine code after the translation: This machine code of course has well defined semantics in the context of the executing hardware platform and, potentially, influenced by the operating system it is run on. You can consult the documentation for both to find out what the exact behavior will be, and this will be the reliable outcome of the execution, regardless of what the C++ standard says. As you did not tag your target platform and OS in the question, I will refrain from guessing what will happen on your machine.

So in short: You won't know what happens by looking at the C++ code. You are, in principle, able to know what will happen when looking at the generated binary, given that you understand the behavior of your target platform well enough.

To answer the literal headline of your questions: there do exist hardware/OS configurations in the real world (in particular in embedded systems) where dereferencing the null address will not be caught by either the hardware or the OS, so you cannot rely on this always resulting in program termination by the environment.

In practice, should you care? Not unless you need to reason about the behavior of a pre-generated binary which you are unable to change. From a C++ perspective it is easy enough to perform a null check before dereferencing and (reliably) terminating the program manually if that is the desired outcome. Reasoning about program behavior after the actual dereferencing occurred is not sensible from a C++ perspective.

0
wearetherobots On

The short answer: dereferencing nullptr is UB and does not always result into a segmentation fault.

That is because it depends on:

  • what the compiler does,
  • what the program does.

I've seen code like this in production code (simplified in this example, the lines were obviously not close to each other in the actual case):

struct SomeType;
void f(SomeType const& ref)
{
  if (&ref == nullptr)
    // bailout
  // ...
}
void g()
{
  SomeType* p = nullptr;
  f(*p);
}

Crazy, right? But you can see why that "works", if you assume that references are implemented as pointers by the compiler...

But, as mentioned in many of the comments, you should not write code like that. Although this is sometimes more easily said than done (we all write buggy code at some point...).


At the risk of going slightly off-topic, let me explain how the above issue was identified in the said production code, because it eventually caused a segmentation fault.

The check on nullptr was not in the code of the function, it was instead in the logic of dynamic_cast, which results in nullptr when applied to nullptr. Unfortunately, a new version of GCC came with an optimisation in dynamic_cast that removed the check on nullptr when the pointer was determined to not be nullable, as in the case of taking the address of a reference...

void f(SomeType const& ref)
{
  if (dynamic_cast<SomeOtherType const*>(&ref) == nullptr)
    // bailout
  // ...
}

In this instance, the issue was relatively quickly identified and the code fixed... but this is just one example among many!