I have a stack of 1000 images with more than 120k objects. After running a Connected Component Algorithm in a third party software, I obtained a coloured image, in which each connected object has a label, a color and an ID value assigned.
I need to extract the list of the voxels coordinates that make up each of the objects.
I have written a Python code to extract this information but it is extremely slow since it processes ~30 labels/min. Therefore, it would take more than 2 days to process the entire scan in the best scenario.
After converting the stack into a 3D Numpy array (img_stack_numpy
), here is the main part of the code:
# Store the labels and its associated voxels in a dictionary
labels, counts = np.unique(img_stack_np, return_counts=True)
labels_list = list(labels)
dict_labels_and_voxels = {}
for label in labels_list:
if label==0: # We don't want to store the label 0 (background)
continue
index = np.where(img_stack_np == label)
dict_labels_and_voxels[label] = [index]
How could I improve the speed of it?
The main issue with the current code lies in the line
np.where(img_stack_np == label)
. Indeed, it iterate over the700 * 700 * 1000 = 490,000,000
values ofimg_stack_np
for the 120,000 values oflabel_list
resulting in490,000,000 * 120,000 = 58,800,000,000,000
values to check.You do not need to iterate over
img_stack_np
for every label. What you need is to classify the "voxels" by their value (ie. label). You can do that using a custom sort:np.unique
for sake of simplicity);For sake of simplicity and to limit the memory usage, an index-based sort can also be used in replacement to the key-value sort. This can be done with
argsort
. Here is an example code:Here are the results on my machine using a 100x100x1000 integer-based random input with 12000 labels:
Thus, the proposed implementation is 180 times faster in this case.
It should be several thousand times faster for your specified input.
Additional notes:
Using float32 values in
img_stack_np
takes a lot of memory:700 * 700 * 1000 * 4 ~= 2 GB
. Usefloat16
or evenin8
/int16
types if you can. Storing the position of each voxel also takes a lot of memory: from 3 GB (withint16
) up to 12 GB (withint64
). Consider using a more compact data representation or a smaller dataset. For further improvement, look at this this post to replace the quite slownp.unique
call.