#include <glibmm/thread.h>
#include <sys/sysinfo.h>
#include <stdio.h>
void threadLoop(int *PtrCounter)
{
struct timespec sleep = {0};
while (1)
{
*PtrCounter += 1; // #1: commenting this fixes the slow responsiveness
//clock_nanosleep(CLOCK_MONOTONIC, 0, &sleep, NULL); // #2 uncommenting this fixes the slow responsiveness
// sched_yield(); // #3 uncommenting this does nothing (still get slow responsiveness)
}
}
int counter = 0;
void ExitHandler(int signum)
{
printf("counter=%d\n", counter);
exit(0);
}
int main(int argc, char *argv[])
{
int numThreads = get_nprocs();
int i;
printf("using %d threads... (Ctrl-C to stop)\n", numThreads);
signal(SIGINT, ExitHandler);
Glib::Thread *threadArray[numThreads];
for (i=0; i<numThreads; i++)
{
threadArray[i] = Glib::Thread::create(std::bind(&threadLoop, &counter));
}
// never returns
for (i=0; i<numThreads; i++)
{
threadArray[i]->join();
}
}
compiled without optimization so the asm does what the source says (load and store counter
from every thread):
g++ `pkg-config --libs --cflags glibmm-2.4` threads.cpp
I'm running this code on Fedora 38, kernel 6.5.8-200 on a Dell PowerEdge R760 (Xeon Platinum 8480+, 256G memory, get_nprocs() returns 224)
When I run this code, the system becomes horribly unresponsive (slow to respond to mouse clicks and keyboard presses). Navigating around in firefox become unbearably slow. Why is this happening?
Your first reaction may be that this is horrible code because *PtrCounter is not protected by any sort of syncronization and you are correct. I originally was using __atomic_add_fetch() to do the increment and I saw the same slow responsiveness. Trying to get to the minimal reproducible example, I removed the atomic stuff. So yes, it will count incorrectly but I think this helps demonstrate the issue.
Your next reaction is probably, "Of course you are going to get some sluggishness, you are hogging the CPU". Yes, I'm hogging the cpu, but that in itself does not cause this sluggishness. As "#1" in the comments says, if I comment out the "*PtrCounter +=1" the sluggishness goes away and in that case I am still hogging the cpu just as much (as seen from top).
The other thing is if I run this code on other systems (a much older 8-cpu intel, or a newer 128-cpu amd), I do not get the sluggishness. I invite you to try the code on your system. Unless, it is a newer high cpu count intel, I bet it works just fine and you won't experience sluggishness.
You can see the #2 and #3 comments in the code. Not sure what to make of it but I thought it was interesting enough to mention.
So what is at the root of this sluggishness? I get that having lots of contention is bad causing lots of cache thrashing and can reduce the performance of your algorithm, but I don't believe it should impact the responsiveness of the OS. Shouldn't the OS force involuntary context switches and handle interrupts just as fast whether those algorithms are hitting contention or not?