I'm currently using the VEINS library and simulation package to do some experiments. Because these have a very long run time, I'm trying to use the university cluster servers (KITE 2.0/RHEL6.6/Lustre 2.5.29.ddnpf3) -- however, I've now encountered several different run time bugs, with the same code that runs perfectly fine on my local machine (Fedora 23). I'm looking for a way to easily debug this problem. I suspect that the cause lies somewhere in the different gcc
version, or perhaps some other system level library that I can't change remotely (but I'm not sure). I'm certain that the OMNeT++ version is the same; the VEINS library is provided by me and is the same locally and remotely.
An example of the issues I've encountered is discussed here, which I eventually fixed like this (as far as I can tell, both versions have the same semantics... DimensionSet
extends std::set
, and DimensionSet::timeFreqDomain
is a static const
initialized with (Dimension::time, Dimension::frequency)
as in the fix).
What is a good approach to look for the cause? Is there a simple way to "cross-compile" between these machines, or some way to diff the binaries to look for the cause? Where do I look for common ways to deal with problems like these?
I might have tracked the error down to an example of a static initialization order fiasco: MiXiM's
Dimension::time
is a static member, so it should not have been used to initialize other static members. Unfortunately, this is exactly what MiXiM (and, by extension, Veins) did, leading to such crashes.I have pushed commit 7807f47c (part of Veins 4.4), which gets rid of almost all static members, so that the whole of the framework should be safer to use.