I have a dataset which includes approximately 2000 digital images. I am using MATLAB to perform some digital image processing to extract trees from the imagery. The script is currently configured to process the images in a parfor
loop on n cores.
The challenge:
I have access to processing time on a University managed supercomputer with approximately 10,000 compute cores. If I submit the entire job for processing, I get put so far back in the tasking queue, a desktop computer could finish the job before the processing starts on the supercomputer. I have been told by support staff that partitioning the 2000 file dataset into ~100 file jobs will significantly decrease the tasking queue time. What method can I use to perform the tasks in parallel using the parfor
loop, while submitting 100 files (of 2000) at a time?
My script is structured in the following way:
datadir = 'C:\path\to\input\files'
files = dir(fullfile(datadir, '*.tif'));
fileIndex = find(~[files.isdir]);
parfor ix = 1:length(fileIndex)
% Perform the processing on each file;
end
Similar to my comment I would spontaneously suggest something like