I am dealing with images greater than 8 GB in .svs format. Using openslide, I have read them as 1D numpy arrays. Now in order to feed them into an algorithm I need to reshape them into image form for processing the pixel location related information. Since the images are very large, using PIL to convert numpy array as
image=np.load('test.npy')
im=Image.fromarray(image)
is throwing me an error size does not fit in int
. I tried to workaround this error by changing the dtype
from uint8
to uint64
but, my python keeps on crashing despite having 64GB RAM and 3 TB memory
on my workstation.
Then I tried to load numpy array using memmap
:
im = np.load(curr_path)
shapeIm=im[:].shape ##shape of the image
name_no_ext = os.path.splitext(f[i])[0]
filename=path.join(dir,name_no_ext+'.tif') ##filename to save the image file
#Create a memmap with dtype and shape that matches our data:
fp = np.memmap(filename, dtype='uint8', mode='w+',shape=shapeIm) #memmap to read/write very large image files in chunks directly from disk
#Write data to memmap array:
fp[:] = im[:]
fp.filename == path.abspath(filename)
#Deletion flushes memory changes to disk before removing the object:
del fp
#Load the memmap and verify data was stored:
newfp = np.memmap(filename, dtype='uint8', mode='r+', shape=shapeIm)
Now the above code is giving me an image in .tif
format. But, I cannot process it. I couldn't analyse why? I found that when I tried to read that image and print its shape.
AttributeError: 'NoneType' object has no attribute 'shape'
So, This way also failed for me. Then I tried reshaping numpy array in the shape of the image which is (44331, 64625, 3), and I got the following error
ValueError: sequence too large; cannot be greater than 32
Can anyone help me how to process such image. I have annotations of these images in x,y,z pixel locations and to process these annotations as ground truth, I need to convert my numpy array in the form an image.
Any help would be great.
Edit: I got reshaping numpy array working now. But, still do not know how to use numpy files as my dataset input instead of images.