How to create Tensorflow dataset batches for variable shape inputs?

142 Views Asked by At

I have dataset for image captioning. Each image has different number of captions (or sentences), let say some images have seven captions and other may have ten or more. I used the following code for dataset creation:

def make_dataset(videos, captions):
    dataset = tf.data.Dataset.from_tensor_slices((videos, tf.ragged.constant(captions)))
    dataset = dataset.shuffle(BATCH_SIZE * 8)
    dataset = dataset.map(process_input, num_parallel_calls=AUTOTUNE)           
    dataset = dataset.batch(BATCH_SIZE).prefetch(AUTOTUNE)
    
    return dataset

this code is worked fine only when the BATCH_SIZE = 1 . when I try to use BATCH_SIZE = 2 or more I get the following error:

InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [7,20], [batch]: [10,20] [Op:IteratorGetNext]

Is there a way to merge these data in batches without using padding?

0

There are 0 best solutions below