How to work with 3D images in Keras ImageDataGenerator flow_from_dataframe

831 Views Asked by At

I want to estimate numerical values using 3D images, so i want to combine 3D CNN with Regression. I am working on 3D image data stored as .raw files with shape (200,200,200). When I try to use Keras ImageDataGenerator for fitting the model it throws the following error:

UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f0b2a5bc400>

it seems like PIL cannot open 3D image data, So how can i preprocess and load images before using flow_from_dataframe function

training_datagen = ImageDataGenerator()

train_generator = training_datagen.flow_from_dataframe(
        dataframe=df_train,
        directory="./patches",
        x_col="Images",
        y_col="Permeability",
        target_size=(200, 200,200),
        batch_size=33,
        class_mode='other',
        validate_filenames=False)

validation_datagen = ImageDataGenerator()

val_generator = validation_datagen.flow_from_dataframe(
        dataframe=df_validate,
        directory="./patches",
        x_col="Images",
        y_col="Permeability",
        target_size=(200, 200,200),
        class_mode='other',
        validate_filenames=False) 
1

There are 1 best solutions below

1
On

tf.keras.preprocessing.image.ImageDataGenerator is a very highly integrated API, which leads little scalability, especially when dealing with 3D data. Additionally, this API will be deprecated in the near future (see in tf2.9 API Docs).

So, my suggestion is that, you should use tf.data.DatasetAPI to build your data pipeline as long as you can get Numpy array or tf tensor from raw datas.

Here are 2 simple samples, the first is from_tensor_slice

import tensorflow as tf
import numpy as np

raw_data = [np.random.normal(size=[200,200,200]) for _ in range(20)]
def map_func(x):
    # if need preprocess data
    # write some logic here
    return x
train_data_pipeline = tf.data.Dataset.from_tensor_slices(raw_data[0:15])\
                                     .map(map_func)\
                                     .batch(5)
val_data_pipeline = tf.data.Dataset.from_tensor_slices(raw_data[15:])\
                                   .map(map_func)\
                                   .batch(5)
for item in train_data_pipeline:
    print(item.shape) 
    # (5,200, 200, 200)
    # (5,200, 200, 200)
    # (5,200, 200, 200)
for item in val_data_pipeline:
    print(item.shape)
    # (5,200, 200, 200)

the second is from_generator

import tensorflow as tf
import numpy as np

raw_data = [np.random.normal(size=[200,200,200]) for _ in range(20)]
def gen_func1():
    yield from raw_data[:15]
def gen_func2():
    yield from raw_data[15:]
def map_func(x):
    # if need preprocess data
    # write some logic here
    return x
train_data_pipeline = tf.data.Dataset.from_generator(gen_func1,
                                                     output_signature=(tf.TensorSpec(shape=[200,200,200],dtype=tf.float32)))\
                                     .map(map_func)\
                                     .batch(5)
val_data_pipeline = tf.data.Dataset.from_generator(gen_func2,
                                                   output_signature=(tf.TensorSpec(shape=[200,200,200],dtype=tf.float32)))\
                                   .map(map_func)\
                                   .batch(5)
for item in train_data_pipeline:
    print(item.shape) 
    # (5,200, 200, 200)
    # (5,200, 200, 200)
    # (5,200, 200, 200)
for item in val_data_pipeline:
    print(item.shape)
    # (5,200, 200, 200)

Many build in functions such as batch,shuffle,take,random,map can be used for custmizing data pipeline. The most import one is map, which makes you able to preprocess datas online.