Valgrind: libnvidia-glcore.so.346.47 Conditional branch or move depends on uninitialized value

When running my C ++ test application with my dynamic link library which is linked to NVIDIA libGL.so, I get the following errors (see below) reported by Valgrind. I am tempted to suppress them, but I'm not sure if this is my problem or something like libnvidia-glcore.so. Some of the unprotected stems form an incomplete understanding of the Valgrind yield. I looked at what variables can be uninitialized in my code when called glXCreateContextAttribsARB

, but I don't see them there. If you can see from my question what types of things am I looking for? The two errors I am getting are as follows:

==10156== Conditional jump or move depends on uninitialised value(s)
==10156==    at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEEADC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F75DA1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F775D3: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F760F5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F3E353: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x4E535F2: opengl_core::render_system::init() (x11_render_system.cpp:92)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156==  Uninitialised value was created by a heap allocation
==10156==    at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156==    by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156==    by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156== 

      


==10156== Conditional jump or move depends on uninitialised value(s)
==10156==    at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DF085F: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4B78B: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4CFBC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F4BFE0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F38ED5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7B20F52: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7F3E2CB: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==  Uninitialised value was created by a heap allocation
==10156==    at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156==    by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156==    by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156==    by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156==    by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156==    by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156== 

      


On request:

 // src/x11_render_system.cpp
 91       m_impl->m_context.make_current(m_impl->m_window);
 92       glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
 93       glClearColor(1.0, 0.0, 0.0, 1.0);  
 94       glXSwapBuffers(display, window);   
 95       m_impl->m_context.make_not_current();

      

+3


source to share


3 answers


Valgrind is quite prone to false positions with critical hardware drivers (like GPU drivers) because of the way they work. Basically, these drivers access GPU memory (and even registers ) through user space (virtual RAM), which is configured by the BIOS (this POSIX mmap works). Thus, the driver can access device registers via arbitrary addresses, just like any other variable.

The point is that some device registers are read-only. For example, they can reflect some device status. Thus, only the device has a reason to write them (and even if the CPU tried to do so, it will fail). Most of the time, it does this internally on power-up and from time to time when state changes, and it reflects user space when setting up the mapping. In essence, these are pure volatile variables ... even more volatile than regular threads to represent concepts, which, incidentally, is handled well by Valgrind as it emulates a CPU.

But Valgrind lives in the world of determinists (CPU and RAM) and these GPU registers are completely out of this world. When the driver reads them, Valgrind simply thinks that it is accessing RAM (due to mmap), which is definitely wrong. So at this point, the driver uses the read data (some device states) to branch accordingly, Valgrind reports, because nothing in the world has ever written this data.

Let's be honest: Proprietary drivers are not open-source, so it's hard to guess what's really going on, but it's probably something similar. What I can tell you for sure is that this happens to Valgrind drivers and GPUs with age (even very small programs), mostly during initialization, and everyone agrees that these are false positives. So you can safely ignore it ... or create a suppression file for Valgrind in your project (call it valgrind.supp):



{
  NVidia-driver
  Memcheck:Cond
  obj:/usr/lib64/nvidia/libnvidia-glcore.so.346.47
}

      

Then you call Valgrind with --suppressions = valgrind.supp and it will no longer report this false positive.

You may have other driver objects associated with this, just add entries for them (you will need to repeat all {...} and change the object line to match what Valgrind says). You may also need to update them every time you update your driver since the version change, although I think you can use basic templates to avoid this.

Look here for more information on this Valgrind feature.

+2


source


Take the following code:

bool x_init = false;

int x;

void initX(){
    x = 4;
    x_init = true;
}

bool X_initialized(){
    x_init;
}

//...

if( X_initialized() && x <3){
    doSomething(x);
}

      

In this case, it is obvious that x is not being used uninitialized, however the / valgrind compiler must prove this and that it sees that "x <3" is using x without initializing it. generally not possible. So if the drivers are obfuscated or just coded without using valgrind (driver vendors tend to test, so most likely they rely on their tests more than profiling tools), it is very possible that valgrind cannot detect this (it is not valgrind error but math limit and if you want third party coding style crash).



However, you should inform the developers of the code you are using (NVIDIA?) About this, it is possible that the issue should be fixed.

Another possibility is that at some point their code requires "random behavior" and as such they use uninitialized values ​​as a source for non-deterministic data (no silver bullets, if you use coverage tools you will soon find out that this is not it's always possible to have 100% coverage if you use profiling tools, they'll crash soon or later too).

Another possibility is that these "uninitialized" values ​​are simply "volatile" variables that are initialized when the drivers are loaded (after a system boost), and hence the "application" cannot see them as initialized (probably the most plausible case)

+1


source


You can show the code around x11_render_system.cpp: 92

But in my opinion valgrind might be wrong and just ignore it unless you find any problems with the valgrind error

-1


source







All Articles