How can I express PGO dependencies in CMake 3.7 and higher?

1.3k Views Asked by At

I have a C++ program that I'm building with Clang 3.9's profile-guided optimization feature. Here's what's supposed to happen:

  1. I build the program with instrumentation enabled.
  2. I run that program, creating a file with profile-data: prof.raw.
  3. I use llvm-profdata to convert prof.raw to a new file, prof.data.
  4. I create a new build of that same program, with a few changes:
    • When compiling each .cpp file to a .o file, I use the compiler flag -fprofile-use=prof.data.
    • When linking the executable, I also specify -fprofile-use.

I have a Gnu Makefile for this, and it works great. My problem arises now that I'm trying to port that Makefile to CMake (3.7, but I could upgrade ). I need the solution to work with (at least) the Unix Makefiles generator, but ideally it would work for all generators.

In CMake, I've defined two executable targets: foo-gen and foo-use:

  • When foo-gen is executed, it creates the prof.raw file.
  • I use add_custom_command to create a rule to create prof.data from prof.raw.

My problem is that I can't figure out how to tell CMake that each of the object files depended upon by foo-use has a dependency on the file prof.data.

  • The most-promising idea I had was to (1) find a way to enumerate all of the .o files upon which foo-use depenends, and then (2) iterate over each of those .o files, calling add_dependency for each one.

    The problem with this approach is I can't find an idiomatic way, in my CMakeLists.txt file, to enumerate the list of object files upon which an executable depends. This might be an open problem with CMake.

  • I also considered using set_source_files_properties to set the OBJECT_DEPENDS property on each of my .cpp files used by foo-use, adding prof.data to that property's list.

    The problem with this (AFAICT) is that each of my .cpp files is used to create two different .o files: one for foo-gen and one for foo-use. I want the .o files that get linked into foo-use to have this compile-time dependency on prof.data; but the .o files that get linked into foo-gen must not have a compile-time dependency on prof.data.

    And AFAIK, set_source_files_properties doesn't let me set the OBJECT_DEPENDS property to have one of two values, contingent on whether foo-gen or foo-use is the current target of interest.

Any suggestions for a clean(ish) way to make this work?

2

There are 2 best solutions below

6
On

The standard way of doing this is to go through the configure+build workflow twice.

The first time, you create the initial build tree and build with both CMAKE_C_FLAGS and CMAKE_CXX_FLAGS set to a value containing -fprofile-generate. Then you run the unit tests you're interested in running. For instance, you might have a CTest label named performance.

Then the second time, you reconfigure in place with -fprofile-use instead. Then you can build, run all of your tests and install or package.

This can all be orchestrated without too much hassle using CMake presets, which I'll show in a minute.

First off, here's the CMakeLists.txt.

cmake_minimum_required(VERSION 3.28)
project(pgo-test)

enable_testing()

add_executable(example main.cpp)

add_test(NAME runs COMMAND example)
add_test(NAME perf COMMAND example --perf)

set_tests_properties(
  perf
  PROPERTIES
  LABELS "performance"
)

This is the easy part. My main.cpp is just a basic "Hello, world!", provided here for ease of copy/paste.

#include <iostream>
int main () { std::cout << "Hello, world!\n"; return 0; }

Now we can write the CMakePresets.json. I'd consult the documentation before reading this.

{
  "version": 6,
  "configurePresets": [
    {
      "name": "base",
      "hidden": true,
      "binaryDir": "build",
      "cacheVariables": {
        "CMAKE_C_FLAGS": "$env{PGO_FLAGS}",
        "CMAKE_CXX_FLAGS": "$env{PGO_FLAGS}"
      }
    },
    {
      "name": "pgo-gen",
      "inherits": "base",
      "environment": { "PGO_FLAGS": "-fprofile-generate" }
    },
    {
      "name": "pgo-use",
      "inherits": "base",
      "environment": { "PGO_FLAGS": "-fprofile-use" }
    }
  ],
  "buildPresets": [
    { "name": "pgo-gen", "configurePreset": "pgo-gen" },
    { "name": "pgo-use", "configurePreset": "pgo-use" }
  ],
  "testPresets": [
    {
      "name": "pgo-gen",
      "configurePreset": "pgo-gen",
      "filter": { "include": { "label": "performance" } }
    }
  ],
  "workflowPresets": [
    {
      "name": "pgo-gen",
      "steps": [
        { "type": "configure", "name": "pgo-gen" },
        { "type": "build", "name": "pgo-gen" },
        { "type": "test", "name": "pgo-gen" }
      ]
    },
    {
      "name": "pgo-use",
      "steps": [
        { "type": "configure", "name": "pgo-use" },
        { "type": "build", "name": "pgo-use" }
      ]
    }
  ]
}

Basically what happens here is that we create two configure presets, pgo-gen and pgo-use, both of which have a common base they inherit from. This is just to show a technique for incrementally changing parts of cache variables using the environment as a side-channel.

Then we create two corresponding build presets and a test preset for pgo-gen that only runs the performance-labeled test cases.

Finally, we create two workflows to drive this. The end-to-end commands are:

$ cmake --workflow --preset pgo-gen
Executing workflow step 1 of 3: configure preset "pgo-gen"

Preset CMake variables:

  CMAKE_CXX_FLAGS="-fprofile-generate"
  CMAKE_C_FLAGS="-fprofile-generate"

Preset environment variables:

  PGO_FLAGS="-fprofile-generate"

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (6.3s)
-- Generating done (0.0s)
-- Build files have been written to: /home/alex/test/build

Executing workflow step 2 of 3: build preset "pgo-gen"

[2/2] Linking CXX executable example

Executing workflow step 3 of 3: test preset "pgo-gen"

Test project /home/alex/test/build
    Start 2: perf
1/1 Test #2: perf .............................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
performance    =   0.00 sec*proc (1 test)

Total Test time (real) =   0.00 sec
$ cmake --workflow --preset pgo-use
Executing workflow step 1 of 2: configure preset "pgo-use"

Preset CMake variables:

  CMAKE_CXX_FLAGS="-fprofile-use"
  CMAKE_C_FLAGS="-fprofile-use"

Preset environment variables:

  PGO_FLAGS="-fprofile-use"

-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/alex/test/build

Executing workflow step 2 of 2: build preset "pgo-use"

[2/2] Linking CXX executable example

And if I delete the build directory and try pgo-use without gen:

$ rm -rf build
$ Executing workflow step 1 of 2: configure preset "pgo-use"

Preset CMake variables:

  CMAKE_CXX_FLAGS="-fprofile-use"
  CMAKE_C_FLAGS="-fprofile-use"

Preset environment variables:

  PGO_FLAGS="-fprofile-use"

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (6.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/alex/test/build

Executing workflow step 2 of 2: build preset "pgo-use"

[1/2] Building CXX object CMakeFiles/example.dir/main.cpp.o
/home/alex/test/main.cpp: In function ‘(static initializers for /home/alex/test/main.cpp)’:
/home/alex/test/main.cpp:5:1: warning: ‘/home/alex/test/build/CMakeFiles/example.dir/main.cpp.gcda’ profile count data file not found [-Wmissing-profile]
    5 | }
      | ^
[2/2] Linking CXX executable example

Then I'm missing the profile data as expected.

I initially wanted to write a single pgo workflow that consisted of all five steps, but sadly CMake currently doesn't allow two configure steps in a workflow. Why? I don't know. I'm going to open a bug report.

0
On

Discussion on author's idea #1

The most-promising idea I had was to (1) find a way to enumerate all of the .o files upon which foo-use depenends, and then (2) iterate over each of those .o files, calling add_dependency for each one.

This shouldn't work according to the documentation for add_dependencies, which states:

Makes a top-level depend on other top-level targets to ensure that they build before does.

Ie. You can't use it to make a target depend on files- only on other targets.

Discussion on author's idea #2

I also considered using set_source_files_properties to set the OBJECT_DEPENDS property on each of my .cpp files used by foo-use, adding prof.data to that property's list.

The problem with this (AFAICT) is that each of my .cpp files is used to create two different .o files: one for foo-gen and one for foo-use. I want the .o files that get linked into foo-use to have this compile-time dependency on prof.data; but the .o files that get linked into foo-gen must not have a compile-time dependency on prof.data.

And AFAIK, set_source_files_properties doesn't let me set the OBJECT_DEPENDS property to have one of two values, contingent on whether foo-gen or foo-use is the current target of interest.

In the comment section, you mentioned that you could solve this if OBJECT_DEPENDS supported generator expressions, but it doesn't. As a side note, there is an issue ticket tracking this on the CMake gitlab repo. You can go give it a thumbs up and describe your use-case for their reference.

In the comments section you also mentioned a possible solution to this:

Potential other solution a) double project system where main user invoked project forwards settings to second pgo project compiling same settings again.

You can actually put this into the CMake project via ExternalProject so that it becomes part of the generated buildsystem: Make the top-level project include itself as an external project. The external project can be passed a cache variable to configure it to be the -gen version, and the top-level can be the -use version.

Speaking from experience, this is a whole other rabbit hole of long CMake-documentation-reading and finicking sessions if you have never manually invoked or done anything with ExternalProject before, so that answer might belong with a new question dedicated to it.

This can solve the problem of not having generator expressions in OBJECT_DEPENDS, but if you want to have multi-config for the top-level project and that some of the configs in the multi-config config not be for PGO, then you will be back to square one.

Proposed Solution

Here's what I've found works to make sources re-compile when profile data changes:

  1. To the custom command which runs the training executable and produces and re-formats the training data, add another COMMAND which produces a c++ header file containing a timestamp in a comment.
  2. Include that header in all sources which you want to re-compile if the training has been re-run.

If you want to support non-PGO builds, wrap the timestamp header in a header which checks that it exists with __has_include and only includes it if it exists.

I'm pretty sure with this approach, CMake doesn't do the dependency checking of TUs on the profile data, and instead, it's the generated buildsystem's header-dependency tracking which does that work. The rationale for including a timestamp comment in the header file instead of just "touch"ing it to change the timestamp in the filesystem is that the generated buildsystem might detect changes by file contents instead of by the filesystem timestamp.

All the shortcomings of the proposed solution

The painfully obvious weakness of this approach is that you need to add a header include to all the .cpp files that you want to check for re-compilation. There are several problems that can spawn from this (from least to most egregious):

  1. You might not like it from an aesthetics standpoint.

  2. It certainly opens up a hole for human-error in forgetting to include the header for new .cpp files. I don't know how to solve that. Some compilers have a flag that you can use to include a file in every source file, such as GCC's -include flag and MSVC's /FI flag. You can then just add this flag to a CMake target using target_compile_options(<target> PRIVATE "SHELL:-include <path>")

  3. You might not be able to change some of the sources that you need to re-compile, such as sources from third-party static libraries that your library depends on. There may be workarounds if you're using ExternalProject by doing something with the patch step, but I don't know.

For my personal project, #1 and #2 are acceptable, and #3 happens to not be an issue. You can take a look at how I'm doing things there if you're interested.

Toward a standard PGO CMake module

See https://gitlab.kitware.com/cmake/cmake/-/issues/19273