I use PPL on 2 sockets Windows machine (16C32T x 2 = 64 logical core).
CurrentScheduler->GetNumberOfVirtualProcessors() reports 64 processors.
But concurrency::parallel_for use only first socket and total CPU usage never reach 100%.
How to use all sockets (all NUMA nodes) with one parallel_for?
I think you got it wrong...
The
concurrency::parallel_forfunction in the PPL uses the system's default scheduler, so it may NOT distribute the workload evenly across all sockets.So you must create a custom scheduler that explicitly assigns work to each socket. It must be something like this:
It's just a concept; I did NOT tested yet.