Valgrind: libnvidia-glcore.so.346.47 Conditional jump or move depends on uninitialised value

759 Views Asked by At

When running my test c++ app against my dynamic library which links against NVIDIA's libGL.so I am getting the following errors (see below) reported by Valgrind. I am tempted to suppress them, but I am not sure if this is my issue or something libnvidia-glcore.so has. Part of the unsurety stems form not fully understanding Valgrind's output. I have looked into what variables might be uninitialized in my code in the call to glXCreateContextAttribsARB but I do not see any there. If it appears from the output to by my issue what types of things am I looking for? The two errors I am getting are:

==10156== Conditional jump or move depends on uninitialised value(s)
==10156==    at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEEADC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F75DA1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F775D3: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F760F5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F3E353: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x4E535F2: opengl_core::render_system::init() (x11_render_system.cpp:92)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156==  Uninitialised value was created by a heap allocation
==10156==    at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156==    by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156==    by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156== 

==10156== Conditional jump or move depends on uninitialised value(s)
==10156==    at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DF085F: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4B78B: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4CFBC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4BFE0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F38ED5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7B20F52: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F3E2CB: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==  Uninitialised value was created by a heap allocation
==10156==    at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156==    by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156==    by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156== 

As per request:

 // src/x11_render_system.cpp
 91       m_impl->m_context.make_current(m_impl->m_window);
 92       glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
 93       glClearColor(1.0, 0.0, 0.0, 1.0);  
 94       glXSwapBuffers(display, window);   
 95       m_impl->m_context.make_not_current();
2

There are 2 best solutions below

0
On BEST ANSWER

Valgrind is quite prone to false positive with critical hardware drivers (such as GPU drivers) due to the way they work. Basically, these drivers access the GPU's memory (and even registers) through user space (virtual RAM) which is setup by the BIOS (this is POSIX mmap at work). This way, the driver can access device's registers through arbitrary addresses, like any other variable.

The point is that some device's registers are only meant to be read. For example, they could reflect some status of the device. Thus, only the device have a reason to write them (and even if the CPU tried to do this, it would fail). Most of the time, it does so internally at power up, and from time to time when status change, and it reflects to user space when mapping is setup. In essence, these are pure volatile variables... even more volatile than the usual thread to thread conception of it, which by the way is well handled by Valgrind since it emulates CPU.

But Valgrind lives in a determinist world (CPU and RAM) and these GPU's registers are completely out of this world. When the driver read them, Valgrind simply think it is accessing RAM (due to mmap), which is definitely not true. Thus, at the point the driver use the read data (some device status) to branch accordingly, Valgrind reports because nothing in its world ever wrote this data.

Let's be honest: proprietary drivers are not open-source, so it's hard to guess what is really happening, but it is likely something similar. What I can tell for sure is that this is happening with Valgrind and GPU drivers since ages (even with very small programs), mainly during initializations and everybody agrees these are false positives. Thus, you can safely ignore it... or create a suppression file for Valgrind in your project (let's name it valgrind.supp):

{
  NVidia-driver
  Memcheck:Cond
  obj:/usr/lib64/nvidia/libnvidia-glcore.so.346.47
}

Then you call Valgrind with the option --suppressions=valgrind.supp and it will no longer report these false positive.

You may have other driver objects related to this, just add entries for them (you'll have to repeat the whole {...} and modify the object line to match what Valgrind reports). You may also have to update them everytime you update your driver since the version changes, though I guess you can use basic wildcards to avoid this.

Take a look here for more infos on this Valgrind feature.

0
On

Take the following code:

bool x_init = false;

int x;

void initX(){
    x = 4;
    x_init = true;
}

bool X_initialized(){
    x_init;
}

//...

if( X_initialized() && x <3){
    doSomething(x);
}

In this case it is evident x is not used uninitialized, however the compiler/valgrind have to prove that, and what it sees is that "x<3" is using x without initializing it.. Proving arbitrary stuff about code is generally not possible. So if drivers are obfuscated or just coded without using valgrind ( driver vendors tends to have milion of tests, so it is likely they rely on their tests more than profiling tools) it is very possible valgrind can't detect that (it's not a failure of valgrind, but a mathematical limit and if you wish a failure about coding style of third parties code).

However you should report that to the maintainers of the code you are using (NVIDIA?), it is possible that's an issue that needs to be fixed.

Another possibility is that at some point their code requires "Random behaviour" and as such they use uninitialized values as source for non deterministic data (there are no silver bullets, if you use coverage tools you'll soon know that is not always possible have 100% coverage, if you use profiling tools they will soon or later fail too)..

Another chance is that those "uninitialized" values are just "volatile" variables that are initialized when drivers are loaded (after system boostrap) and hence the "application" cannot see them as initialized (probably the most plausible case)