Loading huge cell array containing structures

95 Views Asked by At

I have an issue with saving and loading a huge dataset in Matlab.

My dataset contains properties of series of images using Matlab's regionprops. I currently have a MAT-file of about 21GB and this takes a while to load.
This MAT-file has one cell array containing structure arrays of the properties of ellipses on each slice.

Are they any suggestions as to how to go around this? Is there any better and efficient way of saving MAT-files than the -v7.3 formats?

1

There are 1 best solutions below

2
On BEST ANSWER

One solution could be to use the 'table' argument to regionprops. This causes the output to be a table rather than a struct array. This format is more efficient for storage than the struct array.

Better yet, if you don't mind manually keeping track of what data is where, is to create a numeric array with the relevant data:

BW = imread('text.png'); % Example image used in the docs
s = regionprops(BW,{'MajorAxisLength','MinorAxisLength','Orientation'});
t = regionprops('table',BW,{'MajorAxisLength','MinorAxisLength','Orientation'});
m = [s.MajorAxisLength; s.MinorAxisLength; s.Orientation];

whos

  Name        Size             Bytes  Class      Attributes

  BW        256x256            65536  logical              
  m           3x88              2112  double               
  s          88x1              31872  struct               
  t          88x3               3496  table                

A numeric array is a much more efficient way of storing data than a struct array, because each element in the struct array is a separate matrix, which needs its own header. The header (114 bytes I believe) in this case is far larger than the value stored in the array (8 bytes in this case), hence the overhead of 31872 / 2112 = 15.1.

The table stores each column in a separate array, so there you have a much smaller overhead. Instead of having 3 x 88 (number of features x number of objects) arrays, you have only 3.

If each image is guaranteed to have the same number of objects, you could consider putting these matrices into a single 3D array instead of a cell array. The gain here would be smaller.