I'm getting a System.BadImageFormatException, but not when loading a DLL when I would expect to get it. I'm getting it when I set a bool in a class to true after replacing the dlls on disk. It's consistent and repeatable, but the debugger doesn't reveal a heck of a lot, and it only started happening in the last week. We've been doing this exact process for quite a while and don't know what could have happened to trigger this new behavior.
The exception is suspicious. When the exception occurs outside the debugger, the Console outputs the following:
Unhandled Exception.
Cannot print exception string because Exception.ToString() failed.
When I point the debugger at it, it does get caught momentarily. It doesn't get past the catch heading but when I point the debugger there, if I look at the other members of PM, I get the same exception from all the properties (not the fields) and the exception messages are "path/to/dll The signature is incorrect". When I catch the BadImageFormatException provoked from setting PM.AIRestarting, it gives me the message "Bad IL Range" (and the stack trace points to the correct place, though the _stackTraceString variable is all messed up). If I let it try doing anything else, I get kicked out of the debugger with the message:
"The target process exited with code 0 (0x00000000) while evaluating the function 'System.BadImageFormatException.ToString()'. If this problem happens regularly, consider disabling the Tools->Options setting "Debugging->General->Enable Property evaluation and other implicit function calls" or debugging the cause by evaluating the expression from the immediate window. See help for information on doing this."
My best guess is that because the files are getting replaced on disk (see details below), the signature is wrong (and is being checked for some reason). How can I prevent it from looking at the dll for literally the next 2 seconds before the restart occurs? Further details follow.
Background: We have a hardware appliance we've written a Linux back end service and web app service in .NET/Blazor. We've written an updater which updates both services from a remote application running in Windows. The service needs to be updated on the fly, but it must be restarted before the changes take effect.
Process: The manual way to update is to copy files over, replacing the existing files. The automated way is to use a tarball to replace the existing files. Either way, we make sure the ownership and permissions are correct before restarting the appliance. For cleanliness, we have the updater restart the appliance we're running on. When automated, this extraction is performed by running a script which runs a command to remove wwwroot and other directories containing temporary files or files we want gone. It also deletes old files that aren't used anymore and aren't in the tar file; it then extracts the tar file using
sudo tar --same-owner --touch --overwrite -xzf path/to/tarball.tar -C /
(The tar command might look a little odd because we're debugging this error at the moment) Then it proceeds to do some permissions management. When the script exits, the command makes the appliance restart- or it's supposed to, but that's when this error occurs.
Because we have a very complex set up, a simple project exhibiting this error is difficult to recreate for anyone not running our set up. Everyone here consistently exhibits the error, and as far as we can tell the error occurs at the same spot, every time. We change things on disk regularly without issue on Linux without interfering with memory, so I don't understand how this could be different.
Here's the function the error occurs in:
private int HandleInstallFirmwareUpdateStep2(byte[] arrUserCred, byte[] arrCmdData,
StringBuilder logMsg, out byte[] arrRespData)
{
if (PM is { AssetMgr: { } }) //Note that PM is perfectly accessible here
{
foreach (MachineDriveTrain mdt in PM.AssetMgr.Machines)
mdt.Run = false;
}
int nRet;
try
{
nRet = csc.Execute(); //This runs the install script
//This stops the analyze loop from doing any work
if (nRet == ClientServer.CSR_OK)
{
PM.AiRestarting = true; //This is the line which consistently fails
//Wait 1 second then RESTART appliance
var t = new Thread(() => { PM.RestartAppliance(false); });
t.Start();
}
else
{
Utils.StartPostgreSqlService(out _, out _);
RestoreWebAppAndDbConnection();
}
}
catch (Exception ex)
{
nRet = ClientServer.CSR_InstallVersionUpdateFailure;
}
return nRet;
}
The problem turned out to be that the dll file was suddenly being accessed on disk when it hadn't been previously. We were restarting the device, so the window for it to do so was small. When it checked the file, the signatures didn't match because they were different files, and the process crashed. The solution was using the tar command with --recursive-unlink to make sure everything stayed in memory but was removed from disk before copying the files. After that, the error went away.