I think this happened first when we updated nuget references, since I can go back and checkout those commits and still build a .NET Native release of our UWP app.
The build itself takes about 20-30 minutes (nearly an hour on Azure DevOps pipeline) before it fails on the rhbind step.
Microsoft.NetNative.targets(805, 5): RHBIND : error RHB0002: Failed to write PDB.
Microsoft.NetNative.targets(805, 5): ILT0005: 'C:\Users\chris.palmer\.nuget\packages\runtime.win10-x64.microsoft.net.native.compiler\2.2.10-rel-29722-00\tools\x64\ilc\Tools\rhbind.exe @"C:\Projects\FGX\FGX.UWP\obj\x64\Release.NetNative\ilc\intermediate\rhbindargs.FGX.UWP.rsp"' returned exit code 2
I've tried commenting out the
<Assembly Name="*Application*" Dynamic="Required All" />
line in the default.rd.xml file and it had no effect on the error that I could see.
I've also tried several suggested project file build options, such as <ShortcutGenericAnalysis>true</ShortcutGenericAnalysis> and <UseDotNetNativeSharedAssemblyFrameworkPackage>false</UseDotNetNativeSharedAssemblyFrameworkPackage>.
I branched out from the last good build and updated the references one-by-one and was able to update all of the packages without failing, but when I put those .csproj files from the branch back into our develop branch, the error re-appeared (there were a few new references and several code changes in between there). So, I'm relatively sure it's one of those, but is there any way of knowing? Are there code changes that can cause this?
I'm running out of places to look, and with 30 minutes per build to test it, it's getting old. Any tips, tricks, suggestions, or resources would be welcome. Unfortunately, the .NET Native compilation is required to release our next version into the Windows Store.
Edit: I went to last known good build, updated the nuget packages, built with no problem. Moved those .csproj files to the latest commits and, with no changes to the project files, I get the same error. Pretty sure it isn’t the packages, per se. Anyone know of code changes that could cause this?
This is a complex problem and even though I have never seen it before, I can help to debug msbuild pipeline to solve it. First, we should have a plan. I suggest:
Due to the size of your state of art it is likely take days. The difference between “diagnostic surgery” and “find appropriate solution” is that during the surgery we are free to do anything, even something not applicable or not acceptable to production build, like disabling parallelism.
Building up theories
Ok, from msbuild perspective what I see is – the “Failed to write PDB.” Error. Few theories can be brought up right away. File access concurrency. MSBuild on IDE and in prod pipelines building projects in parallel, so it is very easy to face a race condition while writing files, especially large files or on a large solution. There are a set of build-in protection though, e.g. most widely used “Copy” task consume parameter RetryDelayMilliseconds from $(CopyRetryDelayMilliseconds) variable and Retries from $(CopyRetryCount). This way e.g. when a big solution have a big-shared bin folder, some nodes can concurrently write same files and this can be easily resolved with this variables. And this variables are consumed by other tasks as well like CreateAppHost from SDK targets, etc. For such a big solution I definitely recommend to have it set anyway, it will not hurt but can save lots of machine hours in case of minor concurrency somewhere. But before applying to this specific solution we have to analyze
Microsoft.NetNative.targets line 805.Ok, while looking on the next line from your error I can see that you are consuming .NetNative from a nugget package and there is a version “2.2.10-” which is a pre-release version, let me download and see. I definitely not recommend running prod builds with pre-release unless this is an exact goal here – to test integration beforehand. This is likely buggy version that appeared Feb 8. Now I can see that it is “LoggerBasedExecTask” been executed and variables mentioned above is not passed to this task. Also now I know that the target that been executed is “BuildNativePackage”. We can isolate all “BuildNativePackage” stages, disable parallelism to avoid concurrency here. In fact if been applied to proper stage this can even speedup the build because it can be the case that this stage is executed multiple times on same artifacts. But this stage don’t have inputs and outputs to be optimized, so you can later get diagnostic logs to build better picture, on what stage to apply hooks to disable parallelism, the stages should have input and output parameter to take advantage of this. Entire file is 1000 lines of msbuild code, so looks good to me, not the end of the world, could analyze any issue and even optimize it. Also, the file is not using any Retry mechanism at all no Parallelism-specific attributes, no “input” / “output” so doesn’t looks like well optimized. We have to try avoiding running BuildNativePackage targets in parallel.
Diagnostic surgery
Ok, the easiest way to start from is just disable parallel build completely and see if it helps to prove we are going right direction. For this you should alter your Azure pipeline (create a DRAFT). If you are building with msbuild, we just have to remove “-m” from it. If you are building with “donnet build” there is --disable-parallel. This will increase the build time.
Analyze the result
So far this is the only theory, so, let’s see your comments, hopefully this will not increase build time too much (my usual expectation is ~30% because even large solution quite often suffers from poor parallelization due to lots of small independent projects and few numbers of really big that is hard to build and that takes most time).
Find appropriate solution
Not sure if the nugget packages you are trying to upgrade is Microsoft.Net.Native.Compiler itself to prerelease 2.2.10-rel, if so, I think it is a bad idea. Anyway, we can try to add some hooks to make sure the targets we want will not run in parallel. In fact it is too early to describe appropriate solution, you can think of it as of another stage of surgery, we will see. Try to put this Directory.Build.targets near your solution. I've tested this on my artificial case with concurrent file writes where it solves the problem.
Here you can fine-tune things, e.g. to lock any entrance to this target or to lock it per-project, to lock another upper-level targets, etc.
Good luck, I hope if this is not the end, at least it is decent diving deeper!