Caching efficiency of using a lot of static variables and template metaprogramming C++

78 Views Asked by At

Suppose I have the C++ code below

template <int Id>
struct Foo {
    static inline int x {};
};

int main() {
    using T0 = Foo<0>;
    using T1 = Foo<1>;
    ...
    using T999 = Foo<999>;
    // Do some stuff with the various x's at runtime
    int result = T0::x + T1::x + ... + T999::x;
}

Now inspect the sum in the final line. Is there any guarantee that the various static x's will be stored contiguously in memory and that this sum will be efficient cache wise? Or in general should one assume that the static variables are just stored in some random order in random places in such a way that would not be cache efficient.

In the case of the former is it because the using statements appear in order? Does template instantiation occur because of the presence of a using statement or does the template actually have to be used (in which case maybe they are stored in usage order)?

In the case of the latter would using a tuple of size 1000 and each Foo::x using the n-th index in that tuple as its variable solve that problem?

3

There are 3 best solutions below

0
MartinYakuza On

Although it might be the case for it to be allocated contigously, it's not actually guaranteed. It's better to create an array and make constant indexes to refer to particular fragments, e.g. const variable7_ind = 7 and then use T[variable7_ind] instead of T7.

4
user23773995 On

The code is

#include <iostream>

template <int Idx>
struct Foo {
  static inline int x{};
};

template <int Idx>
struct Bar {
  static inline int x{};
};

int main() {
  using A = Foo<0>;
  using B = Foo<1>;
  using C = Foo<2>;
  using D = Bar<0>;
  std::cout << &B::x << std::endl;
  std::cout << &C::x << std::endl;
  std::cout << &D::x << std::endl;
  std::cout << &A::x << std::endl;
}

My system is:

Homebrew clang version 17.0.6
Target: arm64-apple-darwin22.1.0
Thread model: posix

And the output is

0x100dc4000
0x100dc4004
0x100dc4008
0x100dc400c

From this we can conclude that the sum will be efficient as long as the

// Do some stuff with the various x's at runtime

uses the different x's in order as that is the same order in which they will be stored in memory. It is based on order of usage not order of the using statements

NOTE I don't know if this is the case on all systems, but that is how it seems to be behaving on mine. Print stuff out and profile on yours to make sure

0
273K On

Template aliases like using T0 = Foo<0>; do not instantiate templates.

int result = T0::x + T1::x + ... + T999::x;

The evaluation order of arguments is unspecified. Hence, the template instantiation order is also unspecified.

https://godbolt.org/z/rzWvjbEaT

func():
        mov     eax, DWORD PTR Foo<1>::x[rip]
        add     eax, DWORD PTR Foo<0>::x[rip]
        add     eax, DWORD PTR Foo<999>::x[rip]
        ret
Foo<999>::x:
        .long   999
Foo<1>::x:
        .long   1
Foo<0>::x:
        .zero   4

The access order: 1, 0, 999. The in memory order: 999, 1, 0.

If you need a specific order, use arrays and loops.