I parallelised the code below but the simulation time is actually 400-500 times longer than the serial code. The only reason i can think of that can cause this is the message 'variable x is indexed but not sliced in parfor loop and 'variable p is indexed but not sliced in parfor loop. Can anyone verify whether this is the reason for the huge increase in simulation time or the way i parallelised the code.
p=(1,i) and x(1,i) are matrix with values set before hand.
nt=1;
nc=32;
time(1,1) = 0.0;
for t=dt:dt:0.1
nt=nt+1;
time(1,nt) = t;
disp(t);
for ii=2:nc
mytemp=zeros(1,ii);
dummy=0.0;
parfor jj=1:nc+1
if ii==jj % skipped
continue;
end
dxx = x(1,jj) - x(1,ii);
rr=abs(dxx);
if rr < re
dummy(jj) = (p(nt-1,jj)-p(nt-1,ii))*kernel(rr,re,ktype)*rr;
mytemp(jj) = kernel(rr,re,ktype)*rr;
%sumw(1,ii) = sumw(1,ii) + kernel(rr,re,1);
end
end
mysum = sum(dummy);
zeta(1,ii)=sum(mytemp);
lapp(1,ii) = 2.0*dim*mysum/zeta(1,ii);
p(nt,ii) = p(nt-1,ii) + dt*lapp(1,ii);
end
% update boundary value
p(nt,1) = function_phi(0,t);
p(nt,nc+1) = function_phi(1,t);
end
Can't be sure that is the reason, but if some parts of the code end up being parallelized while others cannot, it will create a lot of overhead without any speedup. See for example the Q&A here for a more detailed discussion of slicing.
Basically, if you have a
parforwith a variablejj, then every statement in whichjjis used on the right hand side should also usejjon the left hand side - in that way, the "job" can be divided between different processors, each of which tackles part of the array in parallel. As soon as that doesn't happen, for example in your linesyou break the paradigm. 400x slower? I don't know about that - but the warning is pretty clear.
The first two lines could be consolidated, by the way, by computing
rr(jj)(although you don't need an array):You then use that value rather than
rrlater in the loop. This is a bit like having aprivatevariable for each copy of the loop (a concept that I don't think Matlab has - but exists in OMP ).I don't see where
pis indexed in theparforloop … it seems to be update outside of the inner loop, where it ought not to matter.You might find it helpful to profile your code with the parallel profiler http://www.mathworks.com/help/distcomp/profiling-parallel-code.html - it will be instructive.