I needed to convert my data generator based on Sequence to tf.data.Dataset format. For that purpose, I used the from_generator function to create repeating BatchedDataset for all my train, validation and testing data.

  dataset = tf.data.Dataset.from_generator(gen_function,
                                           output_signature=output_signature)
  dataset = dataset.shuffle(shuffle_buffer,
                            reshuffle_each_iteration=True)
  dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)

These were used in the model fitting:

OCR.model.fit(x=training_generator,
              validation_data=validation_generator,
              steps_per_epoch=steps_per_epoch, 
              epochs=epochs,
              use_multiprocessing=True,
              callbacks=callbacks,
              workers=workers,
              verbose=verbose)

Which resulted in the following error:

    /user/.../python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py, 
    line 739, in _validate_args raise ValueError(
    ValueError: When providing an infinite dataset, you must specify the number of 
    steps to run (if you did not intend to create an infinite dataset, make sure to 
    not call `repeat()` on the dataset).
    [date time]: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error 
    occurred when finalizing GeneratorDataset iterator: Failed precondition: Python 
    interpreter state is not initialized. The process may be terminated.
    >· [[{{node PyFunc}}]]

This was confusing because I specified the number of steps for my repeating infinite dataset as suggested. Moreover, it worked in this way with the steps_per_epoch specified in that way, when I used a Sequence-based data generator before.

1

There are 1 best solutions below

2
On BEST ANSWER

The solution was simple, you just need to specify the validation_steps parameter, in addition to steps_per_epoch in the fit function.