How to implement affinity on multi-core HT with topological considerations in a C++ program?

1.8k Views Asked by At

I'm developing some C++ multi-core programs with a variable number of threads and I'd like to know how to set a proper (actually "the best") affinity. I use Boost-threads, so I can call get_hardware_concurrency() to know how many logical cores there are. Until now, I wrote a mapping "n_th thread to n-th logical core", but it's not the most smart thing to do, due to multi-socket processors and HyperThreading. My programs are always SIMD-like, so threads have nothing to share between them and, in case of an HT computer, I'd like to bind threads to logical cores in the smartest way I can imagine: 1st logical core on 1st physical, 1st logical on 2nd physical, ... , 1st logical on n-th physical, 2nd logical on 1st physical and so on.

I found a lot of stuff where is discussed how to discover whether HT is enabled or not (CPUID) and how to determine logical and physical cores PER package. I know I have to deal with some assembly code, and it doesn't scare me, but I really couldn't find how to know complete informations about logical cores, physical cores, and packages and how OS deals with all of that.

Being the most concise I can: how can I know the exact location (physical core and package) of the thread referred by OS (Windows and Linux) as N-th ?

4

There are 4 best solutions below

0
On

issues of topology and affinity in multicore environments are conveniently handled by the LIKWID tool suite. It contains, among others, tools for figuring out the topology, pinning threads to cores, and measuring hardware performance metrics:

http://code.google.com/p/likwid

As long as the threading mechanism in a code is based on pthreads and the application is dynamically linked, likwid-pin can bind threads to resources without changing the source code.

1
On

For Windows: GetLogicalProcessorInformation and SetThreadAffinityMask

There also is GetCurrentProcessorNumber(), but the OSes frequently swap threads around when you don't pin them to a specific CPU, so that's not helpful for your purpose on it's own.

0
On

On linux, take a look into man pages for sched_setaffinity

2
On

Here's a code snippet that will give you the CPU topology on Linux.

#!/bin/bash
function filter {
  cat /proc/cpuinfo | grep -E "$1.*: [0-9]*" | sed -e 's/^.*: //g'
}

CPU_ID=`filter processor`
SOCKET_ID=(`filter 'physical id'`)
CORE_ID=(`filter 'core id'`)

for cpu_id in $CPU_ID; do
    echo "cpu $cpu_id: socket${SOCKET_ID[$cpu_id]}_core${CORE_ID[$cpu_id]}"
done

If I run this on a core i7 with HT enabled, I get the following output:

cpu 0: socket0_core0
cpu 1: socket0_core1
cpu 2: socket0_core2
cpu 3: socket0_core3
cpu 4: socket0_core0
cpu 5: socket0_core1
cpu 6: socket0_core2
cpu 7: socket0_core3

Here you can see that cpu 0 and 4 are on the same core, i.e. HT threads on core 0.

Using this in conjuction with either sched_setaffinity or pthread_setaffinity_np(3) will allow you to map your process to a set of CPU. You can also use the taskset(1) with no line of code.