pthreads code not scaling up

241 Views Asked by At

I wrote the following very simple pthread code to test how it scales up. I am running the code on a machine with 8 logical processors and at no time do I create more than 8 threads (to avoid context switching). With increasing number of threads, each thread has to do lesser amount of work. Also, it is evident from the code that there are no shared Data structures between the threads which might be a bottleneck. But still, my performance degrades as I increase the number of threads. Can somebody tell me what am I doing wrong here.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int NUM_THREADS = 3;
unsigned long int COUNTER = 10000000000000;
unsigned long int LOOP_INDEX;

void* addNum(void *data)
{
    unsigned long int sum = 0;
    for(unsigned long int i = 0; i < LOOP_INDEX; i++) {
            sum += 100;
    }
    return NULL;
}

int main(int argc, char** argv)
{
    NUM_THREADS = atoi(argv[1]);
    pthread_t *threads = (pthread_t*)malloc(sizeof(pthread_t) * NUM_THREADS);
    int rc;

    clock_t start, diff;

    LOOP_INDEX = COUNTER/NUM_THREADS;        
    start = clock();

    for (int t = 0; t < NUM_THREADS; t++) {
        rc = pthread_create((threads + t), NULL, addNum, NULL);
        if (rc) {
             printf("ERROR; return code from pthread_create() is %d", rc);
             exit(-1);
        }
    }

    void *status;
    for (int t = 0; t < NUM_THREADS; t++) {
            rc = pthread_join(threads[t], &status);
    }

    diff = clock() - start;
    int sec = diff / CLOCKS_PER_SEC;
    printf("%d",sec);
}

Note: All the answers I found online said that the overhead of creating the threads is more than the work they are doing. To test it, I commented out everything in the "addNum()" function. But then, after doing that no matter how many threads I create, the time taken by the code is 0 seconds. So there is no overhead as such, I think.

1

There are 1 best solutions below

0
On

clock() counts CPU time used, across all threads. So all that's telling you is that you're using a little bit more total CPU time, which is exactly what you would expect.

It's the total wall clock elapsed time which should be going down if your parallelisation is effective. Measure that with clock_gettime() specifying the CLOCK_MONOTONIC clock instead of clock().