I am doing some experiment with NUMA systems.Where in I have a multithread/2-threads c program sharing one cache line (int64) variable between them. When I run both threads on the same node the program takes nearly 50% more time to finish than if I run/bind these threads on two different NUMA nodes. I was thinking it should be other way round because they share the data, they should finish faster if they run on the same node.
Am I missing something here?
pthread_mutex_lock (&shared_mutex);
if (thinfo->thread_num == 1)
shared_var++;
else
shared_var--;
pthread_mutex_unlock (&shared_mutex);