I have tried with OpenMP and Cilk Plus. The result is the same, multithreading works slower.
I don't know what I'm doing wrong. I did what the guy did in this tutorial
His code works better in parallel, while the situation in mine is like this:
PARALLEL: Fibonacci number #42 is 267914296
Calculated in 33.026 seconds using 8 workers
SERIAL: Fibonacci number #42 is 267914296
Calculated in 2.110 seconds using 8 workers
I exactly copied the source code of the tutorial.
I also tried it with OpenMP, the same thing happens there too. I check the usage of CPU cores during the execution. They all work, it is fine.
I tried to change the number of workers with this command:
export CILK_NWORKERS=4
It appears as the number of workers increases, the algorithm runs slower. But sometimes it doesn't. I implemented Cilk codes on both C and C++. No difference.
This is the sequential Fibonacci function:
int fib_s(int n)
{
if (n < 2)
return n;
int x = fib_s(n-1);
int y = fib_s(n-2);
return x + y;
}
This is the parallel Fibonacci function:
int fib(int n)
{
if (n < 2)
return n;
int x = cilk_spawn fib(n-1);
int y = fib(n-2);
cilk_sync;
return x + y;
}
And I calculate running time like this in main() function:
clock_t start = clock();
int result = fib(n);
clock_t end = clock();
double duration = (double)(end - start) / CLOCKS_PER_SEC;
Can anyone help me?
The right answer to your question is hardware-dependent. Many factors can influence the performance of code in general: different software strategy can be applied to accelerate execution. However, some of them are more efficient of the other depending on (1) the particular application chosen and (2) on the particular hardware platform chosen. I would like to recommend to profile your application.
Here, you can find a general introduction to the software profiling while, here, a list of software tools that will help you in this task.
In this and this other links, you can find information for profiling OpenMP application (the case of your question).
It is always a good practice to know and understand what is happening under the hood. This will allow you to locate the bottleneck of the famous tris application/code/hardware.