Are memcheck errors ever acceptable?

242 Views Asked by At

The valgrind quickstart page mentions:

Try to make your program so clean that Memcheck reports no errors. Once you achieve this state, it is much easier to see when changes to the program cause Memcheck to report new errors. Experience from several years of Memcheck use shows that it is possible to make even huge programs run Memcheck-clean. For example, large parts of KDE, OpenOffice.org and Firefox are Memcheck-clean, or very close to it.

This block left me a little perplexed. Seeing as the way the C standard works, I would assume most (if not all) practices that produce memcheck errors would invoke undefined behavior on the program, and should therefore be avoided like the plague.

However, the last sentence in the quoted block implies there are in fact "famous" programs that run in production with memcheck errors. After reading this, I thought I'd put this to test and I tried running VLC with valgrind, getting a bunch of memcheck errors right after starting it.

This lead me to this question: are there ever good reasons not to eliminate such errors from a program in production? Is there ever anything to be gained from releasing a program that contains such errors and, if so, how do the developers keep it safe despite the fact that a program that contains such errors can, to my knowledge, act unpredictably and there is no way to make assumptions about its behavior in general? If so, can you provide real-world examples of cases in which the program is better off running with those errors than without?

5

There are 5 best solutions below

0
On

If a piece of code is running in a context that would never cause uninitialized storage to contain confidential information that it must not leak, some algorithms may benefit from a guarantee that reading uninitialized storage will have no side effects beyond yielding likely-meaningless values. For example, if it's necessary to quickly set up a hash map, which will often have only a few items placed in it before it's torn down, but might sometimes have many items, a useful approach is to have an array which holds data items and has values in the order they were added, along with a hash table that maps hash values to storage slot numbers. If the number of items stored into the table is N, an item's hash is H, and attempting to access hashTable[H] is guaranteed yield a value I that will either be the number stored there, if any, or else an arbitrary number, then one of three things will happen:

  • I might be greater than or equal to N. In that case, the table does not contain a value with a hash of H.

  • I might be less than N, but items[I].hash != H. In that case, the table does not contain a value with a hash of H.

  • I might be less than N, and items[I].hash == H. In that case, the table rather obviously contains at least one value (the one in slot I) with a hash of H.

Note that if the uninitialized hash table could contain confidential data, an adversary who can trigger hashing requests may be able to use timing attacks to gain some information about its contents. The only situations where the value read from a hash table slot could affect any aspect of function behavior other than execution time, however, would be those in which the hash table slot had been written.

To put things another way, the hash table would contain a mixture of initialized entries that would need to be read correctly, and meaningless uninitialized entries whose contents could not observably affect program behavior, but the code might not be able to determine whether the contents of an entry might affect program behavior until after it had read it.

For program to read uninitialized data when it's expecting to read initialized data would be a bug, and since most places where a program would attempt to read data would be expecting initialized data, most attempts to read uninitialized data would be bugs. If a language included a construct to explicitly request that an implementation either read data if it had been written, and otherwise or yield some arbitrary value with no side effects, it would make sense to regard attempts to read uninitialized data without such a construct as a defect. In a language without such a construct, however, the only way to avoid warnings about reading uninitialized data would be to forego some useful algorithms that could otherwise benefit from the aforementioned guarantee.

3
On

There has been a case where fixing the errors reported by Valgrind actually led to security flaws, see e.g. https://research.swtch.com/openssl . Intention of the use of uninitialised memory was to increase entropy by having some random bytes, the fix led to more predictable random numbers, indeed weakening security.

In case of VLC, feel free to investigate ;-)

3
On

One instance is when you are deliberately writing non-portable code to take advantage of system-specific optimizations. Your code might be undefined behavior with respect to the C standard, but you happen to know that your target implementation does define the behavior in a way that you want.

A famous example is optimized strlen implementations such as those discussed at vectorized strlen getting away with reading unallocated memory. You can design such algorithms more efficiently if they are allowed to potentially read past the terminating null byte of the string. This is blatant UB for standard C, since this might be past the end of the array containing the string. But on a typical real-life machine (say for instance x86 Linux), you know what will actually happen: if the read touches an unmapped page, you will get SIGSEGV, and otherwise the read will succeed and give you whatever bytes happen to be in that region of memory. So if your algorithm checks alignment to avoid crossing page boundaries unnecessarily, it may still be perfectly safe for x86 Linux. (Of course you should use appropriate ifdef's to ensure that such code isn't used on systems where you can't guarantee its safety.)

Another instance, more relevant to memcheck, might be if you happen to know that your system's malloc implementation always rounds up allocation requests to, say, multiples of 32 bytes. If you have allocated a buffer with malloc(33) but now find that you need 20 more bytes, you could save yourself the overhead of realloc() because you know that you were actually given 64 bytes to play with.

2
On

memcheck is not perfect. Following are some problems and possible reasons for higher false positive rate:

  1. memcheck's ability and shadow bit propagation related rules to decrease overhead - but it affects false positive rate
  2. imprecise representation of flag registers
  3. higher optimization level

From memcheck paper (published in usenix 2005) - but things might definitely have changed since then.

A system such as Memcheck cannot simultaneously be free of false negatives and false positives, since that would be equivalent to solving the Halting Problem. Our design attempts to almost completely avoid false negatives and to minimise false positives. Experience in practice shows this to be mostly successful. Even so, user feedback over the past two years reveals an interesting fact: many users have an (often unstated) expectation that Memcheck should not report any false positives at all, no matter how strange the code being checked is.

We believe this to be unrealistic. A better expectation is to accept that false positives are rare but inevitable. Therefore it will occasionally necessary to add dummy initialisations to code to make Memcheck be quiet. This may lead to code which is slightly more conservative than it strictly needs to be, but at least it gives a stronger assurance that it really doesn't make use of any undefined values. A worthy aim is to achieve Memcheck-cleanness, so that new errors are immediately apparent. This is no different from fixing source code to remove all compiler warnings, even ones which are obviously harmless.

Many large programs now do run Memcheck-clean, or very nearly so. In the authors' personal experience, recent Mozilla releases come close to that, as do cleaned-up versions of the OpenOffice.org-680 development branch, and much of the KDE desktop environment. So this is an achievable goal.

Finally, we would observe that the most effective use of Memcheck comes not only from ad-hoc debugging, but also when routinely used on applications running their automatic regression test suites. Such suites tend to exercise dark corners of implementations, thereby increasing their Memcheck-tested code coverage.

Here's a section on avoiding false positives:

Memcheck has a very low false positive rate. However, a few hand-coded assembly sequences, and a few very rare compiler-generated idioms can cause false positives.

You can find the origin of the error using --track-origins=yes option, you may be able to see what's going on.

2
On

My experience of posts concerning Valgrind on Stack Overflow is that there is often either or both a misplaced sense of overconfidence or a lack of understanding of the what the compiler and Valgrind are doing [neither of these observations is aimed at the OP]. Ignoring errors for either of these reasons is a recipe for disaster.

Memcheck false positives are quite rare. I've used Valgrind for many years and I can count the types of false positives that I've encountered on one hand. That said, there is an ongoing battle by the Valgrind developers and the code that optimising compilers emit. For instance see this link (if anyone is interested, there are plenty other good presentations about Valgrind on the FOSDEM web site). In general, the problem is that optimizing compilers can make changes so long as there is no observable difference in the behaviour. Valgrind has baked in assumptions about how executables work, and if a new compiler optimization steps outside of those assumptions false positives can result.

False negatives usually mean that Valgrind has not correctly encapsulated some behaviour. Usually this will be a bug in Valgrind.

What Valgrind won't be able to tell you is how serious the error is. For instance, you may have a printf that is passed a pointer to character array that contains some uninitialized bytes but which is always nul terminated. Valgrind will detect an error, and at runtime you might get some random rubbish on the screen, which may be harmless.

One example that I've come across where a fix is probably not worth the effort is the use of the putenv function. If you need to put a dynamically allocated string into the environment then freeing that memory is a pain. You either need to save the pointer somewhere or save a flag that indicates that the env var has been set, and then call some cleanup function before your executable terminates. All that just for a leak of around 10-20 bytes.

My advice is

  1. Aim for zero errors in your code. If you allow large numbers of errors then the only way to tell if you introduce new errors is to use scripts that filter the errors and compare them with some reference state.
  2. Make sure that you understand the errors that Valgrind generates. Fix them if you can.
  3. Use suppression files. Use them sparingly for errors in third party libraries that you cannot fix, harmless errors for which the fix is worse than the error and any false positives.
  4. Use the -s/-v Valgrind options and remove unused suppressions when you can (this will probably require some scripting).