I'm a long-time user of cachegrind for program profiling, and recently went back to check the official documentation once more: https://valgrind.org/docs/manual/cg-manual.html
In it, there are multiple references to CPU models, implementation decisions and simulation models that are all from the mid-2000s, and there are also statements that some behavior changed on "modern" processors:
the LL cache typically replicates all the entries of the L1 caches [...] This is standard on Pentium chips, but AMD Opterons, Athlons and Durons use an exclusive LL cache [...]
Cachegrind simulates branch predictors intended to be typical of mainstream desktop/server processors of around 2004.
More recent processors have better branch predictors [...] Cachegrind's predictor design is deliberately conservative so as to be representative of the large installed base of processors which pre-date widespread deployment of more sophisticated indirect branch predictors. In particular, late model Pentium 4s (Prescott), Pentium M, Core and Core 2 have more sophisticated indirect branch predictors than modelled by Cachegrind.
Now I'm wondering
- how many of these choices still apply in 2021 when developing on latest-gen CPUs,
- whether the implementation of cachegrind has been updated to reflect latest CPUs, but the manual is outdated,
- whether cachegrind shows skewed results on modern CPUs due to its simulation of legacy behavior.
Any insight is greatly appreciated!