In this application I have groups of N (POSIX) threads. The first group starts up, creates an object A, and winds down. A little bit later a new group with N threads starts up, uses A to create a similar object B, and winds down. This pattern is repeated. The application is highly memory-intensive (A and B have a large number of malloc'ed arrays). I would like local access to memory as much as possible. I can use numactl --localalloc
to achieve this, but in order for this to work I also need to make sure that those threads from the first and second group that work on the same data are bound to the same NUMA node. I've looked into sched_setaffinity
, but wonder if better approaches exist.
The logic of the application is such that a solution where there are no separate thread groups would tear apart the program logic. That is, a solution where a single group of threads manages first object A and later object B (without winding down inbetween) would be extremely contrived and obliterate the object-oriented lay-out of the code.
Binding threads in group B to the same cores that they ran on group A is more restrictive than what you need. Modern processors use dedicated level 1 cache (L1) and level 2 cache (L2) per core, so binding threads to a specific core makes sense only to get at data that is still "hot" in those caches. What you probably meant is binding group B threads to the same numa node as the threads in group A, so that the large arrays are in the same local memory.
That said, you have two choices:
Option (1) is relatively easy, so let's talk about how to implement option (2).
The following SO answer describes how to find out, given a virtual address in your process, which numa node has that memory local:
Can I get the NUMA node from a pointer address (in C on Linux)?
Armed with that information, you want to set the affinity of your group B threads to that numa node, for how to do that we go to this SO answer
How to ensure that std::thread are created in multi core?