MATLAB : access loaded MAT file very slow

3.5k Views Asked by At

I'm currently working on a project involving saving/loading quite big MAT files (around 150 MB), and I realized that it was much slower to access a loaded cell array than the equivalent version created inside a script or a function.

I created this example to simulate my code and show the difference :

clear; clc;

disp('Test for computing with loading');

if exist('data.mat', 'file')
    delete('data.mat');
end

n_tests = 10000;
data = {};
for i=1:n_tests
    data{end+1} = rand(1, 4096);
end

% disp('Saving data');
% save('data.mat', 'data');
% clear('data');
% 
% disp('Loading data');
% load('data.mat', '-mat');

for i=1:n_tests
    tic;
    for j=1:n_tests
        d = sum((data{i} - data{j}) .^ 2);
    end
    time = toc;
    disp(['#' num2str(i) ' computed in ' num2str(time) ' s']);
end

In this code, no MAT file is saved nor loaded. The average time for one iteration over i is 0.75s. When I uncomment the lines to save/load the file, the computation for one iteration over i takes about 6.2s (the saving/loading time is not taking into consideration). The difference is 8x slower !

I'm using MATLAB 7.12.0 (R2011a) 64 bits with Windows 7 64 bits, and the MAT files are saved with the version v7.3.

Can it be related to the compression of the MAT file? Or caching variables ? Is there any way to prevent/avoid this ?

2

There are 2 best solutions below

2
On BEST ANSWER

I also know this problem. I think it's also related to the inefficient managing of memory in matlab - and as I remember it's not doing well with swapping. A 150MB file can easily hold a lot of data - maybe more than can be quickly allocated.

I just made a quick calculation for your example using the information by mathworks In your case total_size = n_tests*121 + n_tests*(1*4096* 8) is about 313MB.

First I would suggest to save them in format 7 (instead of 7.3) - I noticed very poor performance in reading this new format. That alone could be the reason of your slowdown.

Personally I solved this in two ways:

  1. Split the data in smaller sets and then use functions that load the data when needed or create it on the fly (can be elegantly done with classes)
  2. Move the data into a database. SQLite and MySQL are great. Both work efficiently with MUCH larger datasets (in the TBs instead of GBs). And the SQL language is quite efficient to quickly get subsets to manipulate.
1
On

I test this code with Windows 64bit, matlab 64bit 2014b.

Without saving and loading, the computation is around 0.22s, Save the data file with '-v7' and then load, the computation is around 0.2s. Save the data file with '-v7.3' and then load, the computation is around 4.1s. So it is related to the compression of the MAT file.