Is appdata shared between workers in a parallel pool?

206 Views Asked by At

I'm working on a complicated function that calls several subfunctions (within the same file). To pass data around, the setappdata/getappdata mechanism is used occasionally. Moreover, some subfunctions contain persistent variables (initialized once in order to save computations later).

I've been considering whether this function can be executed on several workers in a parallel pool, but became worried that there might be some unintended data sharing (which would otherwise be unique to each worker).

My question is - how can I tell if the data in global and/or persistent and/or appdata is shared between the workers or unique to each one?

Several possibly-relevant things:

  1. In my case, tasks are completely parallel and their results should not affect each other in any way (parallelization is done simply to save time).
  2. There aren't any temporary files or folders being created, so there is no risk of one worker mistakenly reading the files that were left by another.
  3. All persistent and appdata-stored variables are created/assigned within subfunction of the parfor.

I know that each worker corresponds to a new process with its own memory space (and presumably, global/persistent/appdata workspace). Based on that and on this official comment, I'd say it's probable that such sharing does not occur... But how do we ascertain it?

Related material:

  1. This Q&A.
  2. This documentation page.
2

There are 2 best solutions below

0
On BEST ANSWER

This is quite straightforward to test, and we shall do it in two stages.

Step 1: Manual Spawning of "Workers"

First, create these 3 functions:

%% Worker 1:
function q52623266_W1
global a; a = 5;
setappdata(0, 'a', a);
someFuncInSameFolder();
end

%% Worker 2:
function q52623266_W2
global a; disp(a);
disp(getappdata(0,'a'));
someFuncInSameFolder();
end

function someFuncInSameFolder()
  persistent b; 
  if isempty(b)
    b = 10;
    disp('b is now set!');
  else
    disp(b);
  end
end

Next we boot up 2 MATLAB instances (representing two different workers of a parallel pool), then run q52623266_W1 on one of them, wait for it to finish, and run q52623266_W2 on the other. If data is shared, the 2nd instance will print something. This results (on R2018b) in:

>> q52623266_W1
b is now set!

>> q52623266_W2
b is now set!

Which means that data is not shared. So far so good, but one might wonder whether this represents an actual parallel pool. So we can adjust our functions a bit and move on to next step.

Step 2: Automatic Spawning of Workers

function q52623266_Host

spmd(2)
  if labindex == 1
    setupData();
  end
  labBarrier; % make sure that the setup stage was executed.
  if labindex == 2
    readData();
  end  
end

end

function setupData
  global a; a = 5;
  setappdata(0, 'a', a);
  someFunc();
end

function readData
  global a; disp(a);
  disp(getappdata(0,'a'));
  someFunc();
end

function someFunc()
  persistent b; 
  if isempty(b)
    b = 10;
    disp('b is now set!');
  else
    disp(b);
  end
end

Running the above we get:

>> q52623266_Host
Starting parallel pool (parpool) using the 'local' profile ...
connected to 2 workers.
Lab 1: 
  b is now set!
Lab 2: 
  b is now set!

Which again means that data is not shared. Note that in the second step we used spmd, which should function similarly to parfor for the purposes of this test.

0
On

There is another not-sharing-of-data that bit me.

Persistent variables are not even copied from the current workspace to the workers.

To demonstrate, a simple function with a persistent variable is created (MATLAB 2017a):

function [ output_args ] = testPersist( input_args )
%TESTPERSIST Simple persistent variable test.

persistent var

if (isempty(var))
    var = 0;
end
if (nargin == 1)
    var = input_args;
end

output_args = var;

end

And a short script is executed:

testPersist(123); % Set persistent variable to 123.
tpData = zeros(100,1);
parfor i = 1 : 100
    tpData(i) = testPersist;
    testPersist(i);
end
any(tpData == 0) % This implies the worker started from 0 instead of 123 as specified in the first row.

Output is 1 - workers disregarded the 123 from the parent workspace and started anew.

Checking the values in tpData additionally shows how each worker did its job by noting say "tpData(14) = 15 - this means worker that completed 15 continued with 14 next"

So, creating a worker = creating completely new instance of MATLAB completely unrelated to the instance of MATLAB you have open in front of you. For every worker separately.

Lesson I gained from that = don't use simple persistent variables as the simulation config file. It worked fine and looked elegant as long as no parfor was used... but broke horribly afterwards. Use objects.