Valgrind message about invalid read on one system, but not another

I need to run a fairly large software package on a new machine for it to work. The app is written in C and C ++ and I am running CentOS 6.5.

The program builds fine, but segfaults when I go to run it. Using valgrind I see the following error at the segfault location:

==23843== Invalid read of size 4
[stack trace here]
==23843==  Address 0x642e7464 is not stack'd, malloc'd or (recently) free'd

      

So, for some reason we are reading from memory, we shouldn't and are invoking undefined behavior. When I download my source files, transfer them to another CentOS 6.5 machine (with the same kernel) and compile them (with the same makefiles and with the same GCC version) that the program works fine.

I ran valgrind on this machine and expected to see invalid reads again. My thought was that invalid read will always be present, but due to the fact that the undefined behavior just happens to work on one machine and not another.

However, I found that valgrind does not report read errors on the second machine. How is this possible?

+3


source to share


3 answers


Valgrind makes the work environment more deterministic, but it doesn't eliminate all randomness. Perhaps the other machine has different versions of the installed libraries installed, or something external that it uses (files, network ..) is different, the code execution should not be exactly the same.



You should look at the stack trace and analyze the code where the error occurs. If it's not obvious from the stack trace, you can start valgrind

using a parameter --vgdb=full

. It pauses execution after an error occurs and prints out the attachment instructions gdb

. Or you can just run the program directly under the debugger - you wrote that it crashes even without valgrind.

+3


source


Different versions of the libraries are the best guess, judging by the information you rarely have. Things to Try:

1) Update both machines to latest version using package manager and try again



2) Run ldd [binary]

to view all libraries used by this program. Run something like md5sum

on them on both machines to see if there are any differences.

All in all, I've made experience that valgrind is really bad at detecting invalid memory accesses on the stack, so this could be a hidden root cause. If all else fails, you can try using a sanitizer for clang and address. This can find things that valgrind won't catch, and vice versa.

0


source


This can be caused by using different versions of Valgrind.

Some common false positives are removed in newer versions. This explains why one machine complains about it (older version) and another does not (newer version).

0


source







All Articles