How to save intermediate iterations during SPMD in MATLAB?

330 Views Asked by At

I am experimenting with MATLAB SPDM. However, I have the following problem to solve:

  • I am running a quite long algorithm and I would like to save the progress along the way in case the power gets cut, someone unplugs the power plug or memory error.
  • The loop has 144 iterations that take each around 30 minutes to complete => 72h. A lot of problems can occur in that interval. Of course, I have the distributed computing toolbox on my machine. The computer has 4 physical cores. I run MATLAB R2016a.
  • I do not really want to use a parfor loop because I concatenate results and have dependency across iterations. I think SPMD is the best choice for what I want to do.

I'll try to describe what I want as best as I can: I want to be able to save at a set iteration of the loop the results so far, and I want to save the results by worker.

Below is a Minimum (non)-Working Example. The last four lines should be put in a different .m file. This function, called within a parfor loop, allows to save intermediate iterations. It is working properly in other routines that I use. The error is at line 45 (output_save). Somehow, I would like to "pull" the composite object into a "regular" object (cell/structure).

My hunch is that I do not quite understand how Composite objects work and especially how they can be saved into "regular" objects (cells, structures, etc).

% SPMD MWE

% Clear necessary things
clear output output2 output_temp iter kk


% Useful thing that will be used later on
Rorder=perms(1:4);

% Stem of the file to save the data to
stem='MWE_MATLAB_spmd';

% Create empty cells where the results of the kk loop will be stored
output1{1,1}=[];
output2{1,2}=[];

% Start the parpool
poolobj=gcp;

% Define which worker/lab will do which iteration
iterperworker=ceil(size(Rorder,1)/poolobj.NumWorkers);
for i=1:poolobj.NumWorkers
    if i<poolobj.NumWorkers
        itertodo{1,i}=1+(iterperworker)*(i-1):iterperworker*i;
    else
        itertodo{1,i}=1+(iterperworker)*(i-1):size(Rorder,1);
    end
end

%Start the spmd
% try
    spmd
        iter=1;
        for kk=itertodo{1,labindex}
            % Print which iteration is done at the moment
            fprintf('\n');
            fprintf('Ordering %d/%d \r',kk,size(Rorder,1));

            for j=1:size(Rorder,2)
            output_temp(1,j)=Rorder(kk,j).^j; % just to populate a structure
            end
            output.output1{1,1}=cat(2,output.output1{1,1},output_temp);  % Concatenate the results
            output.output2{1,2}=cat(2,output.output1{1,2},0.5*output_temp);  % Concatenate the results

            labindex_save=labindex;

            if mod(iter,2)==0
                output2.output=output; % manually put output in a structure
                dosave(stem,labindex_save,output2); % Calls the function that allows me to save in parallel computing
                end
                iter=iter+1;
            end
        end
    % catch me
    % end


    % Function to paste in another m-file
    % function dosave(stem,i,vars)
    %     save(sprintf([stem '%d.mat'],i),'-struct','vars')
    % end
1

There are 1 best solutions below

3
On BEST ANSWER

A Composite is created only outside an spmd block. In particular, variables that you define inside an spmd block exist as a Composite outside that block. When the same variable is used back inside an spmd block, it is transformed back into the original value. Like so:

spmd
    x = labindex;
end
isa(x, 'Composite') % true
spmd
    isa(x, 'Composite') % false
    isequal(x, labindex) % true
end

So, you should not be transforming output using {:} indexing - it is not a Composite. I think you should simply be able to use

dosave(stem, labindex, output);