I parallelised the code below but the simulation time is actually 400-500 times longer than the serial code. The only reason i can think of that can cause this is the message 'variable x is indexed but not sliced in parfor loop and 'variable p is indexed but not sliced in parfor loop. Can anyone verify whether this is the reason for the huge increase in simulation time or the way i parallelised the code.
p=(1,i) and x(1,i) are matrix with values set before hand.
nt=1;
nc=32;
time(1,1) = 0.0;
for t=dt:dt:0.1
nt=nt+1;
time(1,nt) = t;
disp(t);
for ii=2:nc
mytemp=zeros(1,ii);
dummy=0.0;
parfor jj=1:nc+1
if ii==jj % skipped
continue;
end
dxx = x(1,jj) - x(1,ii);
rr=abs(dxx);
if rr < re
dummy(jj) = (p(nt-1,jj)-p(nt-1,ii))*kernel(rr,re,ktype)*rr;
mytemp(jj) = kernel(rr,re,ktype)*rr;
%sumw(1,ii) = sumw(1,ii) + kernel(rr,re,1);
end
end
mysum = sum(dummy);
zeta(1,ii)=sum(mytemp);
lapp(1,ii) = 2.0*dim*mysum/zeta(1,ii);
p(nt,ii) = p(nt-1,ii) + dt*lapp(1,ii);
end
% update boundary value
p(nt,1) = function_phi(0,t);
p(nt,nc+1) = function_phi(1,t);
end
Can't be sure that is the reason, but if some parts of the code end up being parallelized while others cannot, it will create a lot of overhead without any speedup. See for example the Q&A here for a more detailed discussion of slicing.
Basically, if you have a
parfor
with a variablejj
, then every statement in whichjj
is used on the right hand side should also usejj
on the left hand side - in that way, the "job" can be divided between different processors, each of which tackles part of the array in parallel. As soon as that doesn't happen, for example in your linesyou break the paradigm. 400x slower? I don't know about that - but the warning is pretty clear.
The first two lines could be consolidated, by the way, by computing
rr(jj)
(although you don't need an array):You then use that value rather than
rr
later in the loop. This is a bit like having aprivate
variable for each copy of the loop (a concept that I don't think Matlab has - but exists in OMP ).I don't see where
p
is indexed in theparfor
loop … it seems to be update outside of the inner loop, where it ought not to matter.You might find it helpful to profile your code with the parallel profiler http://www.mathworks.com/help/distcomp/profiling-parallel-code.html - it will be instructive.