How can I create a custom data generator for multiple inputs using keras ( tf.keras.utils.Sequence )?

464 Views Asked by At

The model takes four inputs and gives one output. Among those four inputs two is numerical data, one is categorical and another one is image. The output is binary (0 or 1). I need to create a custom data generator which can take those inputs from the dataframe and feed those into the model.

I feed the images into CNN model. The image dataset is too large to feed into the model without using a data generator.

How can I feed those images into the model by batches ? It will be very helpful if I can learn how to create custom data generators according to any specific model.

Thank You.

1

There are 1 best solutions below

2
On

you might not need to use tf.keras.utils.Sequence. I think you can go about it using ImageDataGenerator.flow_from_dataframe. Lets assume you have a dataframe called df with the following columns:

column 0 is the filepaths column that contains the full path to the image file
column 1 first numerical data column let it have column name num1
column 2 2nd numerical data column let it have column name num2
column 3 is the categorical data column, give it the column name cat

ok now create a list of the form

input_list=[num1, num2, cat]

now create the generators

bs=30 # batch_size
img_size=(224,224) # image size to use
gen=ImageDataGenerator(rescale=1/255)
train_gen=gen.flow_from_dataframe(df, xcol='filepaths', y_col=input_list, target_size=img_size, batch_size=bs, shuffle=True, seed=123, class_mode='raw', color_mode='rgb')

Note make sure class_mode is set to 'raw'. To test the generator try this code

images, labels=next(train_gen)
print (images.shape) # should get (30, 224,224,3)
print (labels.shape) # should get (30, 3)

I have used this approach where all the input columns in the input_list were numeric and was able to train a model. I am not sure if this will work for a mmixture of numeric and categorical inputs but I think it will. Note of course you may first want to partition df into a train_df, a test_df and a valid_df using sklearn's train_test_split. In that case you will want to make a train, test and valid generator. In the test generator set shuffle=False. Let me know if this works.