I have a C++ program that I'm building with Clang 3.9's profile-guided optimization feature. Here's what's supposed to happen:
- I build the program with instrumentation enabled.
- I run that program, creating a file with profile-data:
prof.raw. - I use
llvm-profdatato convertprof.rawto a new file,prof.data. - I create a new build of that same program, with a few changes:
- When compiling each .cpp file to a .o file, I use the compiler flag
-fprofile-use=prof.data. - When linking the executable, I also specify
-fprofile-use.
- When compiling each .cpp file to a .o file, I use the compiler flag
I have a Gnu Makefile for this, and it works great. My problem arises now that I'm trying to port that Makefile to CMake (3.7, but I could upgrade ). I need the solution to work with (at least) the Unix Makefiles generator, but ideally it would work for all generators.
In CMake, I've defined two executable targets: foo-gen and foo-use:
- When
foo-genis executed, it creates theprof.rawfile. - I use
add_custom_commandto create a rule to createprof.datafromprof.raw.
My problem is that I can't figure out how to tell CMake that each of the object files depended upon by foo-use has a dependency on the file prof.data.
The most-promising idea I had was to (1) find a way to enumerate all of the
.ofiles upon whichfoo-usedepenends, and then (2) iterate over each of those.ofiles, callingadd_dependencyfor each one.The problem with this approach is I can't find an idiomatic way, in my CMakeLists.txt file, to enumerate the list of object files upon which an executable depends. This might be an open problem with CMake.
I also considered using
set_source_files_propertiesto set theOBJECT_DEPENDSproperty on each of my.cppfiles used byfoo-use, addingprof.datato that property's list.The problem with this (AFAICT) is that each of my
.cppfiles is used to create two different.ofiles: one forfoo-genand one forfoo-use. I want the.ofiles that get linked intofoo-useto have this compile-time dependency onprof.data; but the.ofiles that get linked intofoo-genmust not have a compile-time dependency onprof.data.And AFAIK,
set_source_files_propertiesdoesn't let me set theOBJECT_DEPENDSproperty to have one of two values, contingent on whetherfoo-genorfoo-useis the current target of interest.
Any suggestions for a clean(ish) way to make this work?
Discussion on author's idea #1
This shouldn't work according to the documentation for
add_dependencies, which states:Ie. You can't use it to make a target depend on files- only on other targets.
Discussion on author's idea #2
In the comment section, you mentioned that you could solve this if
OBJECT_DEPENDSsupported generator expressions, but it doesn't. As a side note, there is an issue ticket tracking this on the CMake gitlab repo. You can go give it a thumbs up and describe your use-case for their reference.In the comments section you also mentioned a possible solution to this:
You can actually put this into the CMake project via
ExternalProjectso that it becomes part of the generated buildsystem: Make the top-level project include itself as an external project. The external project can be passed a cache variable to configure it to be the-genversion, and the top-level can be the-useversion.Speaking from experience, this is a whole other rabbit hole of long CMake-documentation-reading and finicking sessions if you have never manually invoked or done anything with
ExternalProjectbefore, so that answer might belong with a new question dedicated to it.This can solve the problem of not having generator expressions in
OBJECT_DEPENDS, but if you want to have multi-config for the top-level project and that some of the configs in the multi-config config not be for PGO, then you will be back to square one.Proposed Solution
Here's what I've found works to make sources re-compile when profile data changes:
COMMANDwhich produces a c++ header file containing a timestamp in a comment.If you want to support non-PGO builds, wrap the timestamp header in a header which checks that it exists with
__has_includeand only includes it if it exists.I'm pretty sure with this approach, CMake doesn't do the dependency checking of TUs on the profile data, and instead, it's the generated buildsystem's header-dependency tracking which does that work. The rationale for including a timestamp comment in the header file instead of just "touch"ing it to change the timestamp in the filesystem is that the generated buildsystem might detect changes by file contents instead of by the filesystem timestamp.
All the shortcomings of the proposed solution
The painfully obvious weakness of this approach is that you need to add a header include to all the .cpp files that you want to check for re-compilation. There are several problems that can spawn from this (from least to most egregious):
You might not like it from an aesthetics standpoint.
It certainly opens up a hole for human-error in forgetting to include the header for new .cpp files.
I don't know how to solve that.Some compilers have a flag that you can use to include a file in every source file, such as GCC's-includeflag and MSVC's/FIflag. You can then just add this flag to a CMake target usingtarget_compile_options(<target> PRIVATE "SHELL:-include <path>")You might not be able to change some of the sources that you need to re-compile, such as sources from third-party static libraries that your library depends on. There may be workarounds if you're using
ExternalProjectby doing something with thepatchstep, but I don't know.For my personal project, #1 and #2 are acceptable, and #3 happens to not be an issue. You can take a look at how I'm doing things there if you're interested.
Toward a standard PGO CMake module
See https://gitlab.kitware.com/cmake/cmake/-/issues/19273