Parallel implementation for Jacobi algorithm takes too much time

271 Views Asked by At

I implemented a parallel version of Jacobi's method for the resolution of a linear system. Doing some tests I noticed that the time to execute the function in parallel is very high compared to the time to execute the sequential function. This is strange because the Jacobi's method should be faster when executed with a parallel implementation.

I think I'm doing something wrong in the code:

function [x,niter,resrel] = Parallel_Jacobi(A,b,TOL,MAXITER)
[n, m] = size(A); 
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;
x0=sparse(zeros(length(A),1));
spmd

    cod_vett=codistributor1d(1,codistributor1d.unsetPartition,[n,1]);
    cod_mat=codistributor1d(1,codistributor1d.unsetPartition,[n,m]);

    B= codistributed(B,cod_mat);
    C= codistributed(C,cod_vett);
    x= codistributed(B*x0 + C,cod_vett);

    Niter = 1; 
    TOLX = TOL;  
    while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER)
        if(TOL*norm(x,Inf) > realmin)
            TOLX = norm(x,Inf)*TOL;
        else
            TOLX = realmin;
        end    
        x0 = x;
        x = B*x0 + C;
        Niter=Niter+1;
    end
end
Niter=Niter{1}; 
x=gather(x);
end

Below there are the tests

%sequential Jacobi
format long;
A = gallery('poisson',20);
tic;
x= jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 0.009054 seconds.
%parallel Jacobi
format long;
A = gallery('poisson',20);
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 11.484130 seconds.

I timed the parpool function with 1,2,3 and 4 workers (I have a quad core processor) and the result is the following:

%Test
format long;
A = gallery('poisson',20);
delete(gcp('nocreate'));
tic
%parpool(1/2/3/4) means that i executed 4 tests that differ only for the 
%argument in the function: first parpool(1), second parpool(2) and so on.
parpool(1/2/3/4);
toc
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;

4 workers: parpool=13.322899 seconds, function=23.772271 

3 workers: parpool=10.911769 seconds, function=16.402633 

2 workers: parpool=9.371729 seconds, function=12.945154 

1 worker: parpool=8.460357 seconds, function=7.982958 .

The less workers, the better the time is. Which is, like @Adriaan said, likely due overhead.

Does this mean that, in this case, the sequential function is always faster than the parallel function? Or is there a better way to implement the parallel one?

In this question it is said that the performance in parallel is better when the number of iterations is high. In my case, with this test, there are only 32 iteration.

The sequential implementation of Jacobi's method is this:

function [x,niter,resrel] = jacobi(A,b,TOL,MAXITER)
n = size(A,1); 
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;

x0=sparse(zeros(length(A),1));
x = B*x0 + C;
Niter = 1; 
TOLX = TOL;  

while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER) 
    if(TOL*norm(x,Inf) > realmin)
        TOLX = norm(x,Inf)*TOL;
    else
        TOLX = realmin;
    end    

    x0 = x;
    x = B*x0 + C;

    Niter=Niter+1;
end
end

I timed the code with the timeit function and the results are these(the inputs are the same of the previous):

4 workers: 11.693473075964102

3 workers: 9.221281335264003

2 workers: 9.150417240778545

1 worker: 6.047181992020434

sequential: 0.002893932969688

0

There are 0 best solutions below