I'm training a keras regression model for detecting skew in images. The training was going on fine until I encountered this error:

W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated. [[{{node PyFunc}}]]

Tensorflow version: 2.10 Python version: 3.10

Here are the steps:

    def data_generator(image_paths, labels, batch_size):
        num_samples = len(image_paths)
        while True:
            for offset in range(0, num_samples, batch_size):
                batch_images = []
                batch_labels = []
                for i in range(offset, min(offset + batch_size, num_samples)):
                    image_path = image_paths[i]
                    label = labels[i]
                    # Load and preprocess image
                    image = load_and_preprocess_image(image_path)
                    batch_images.append(image)
                    batch_labels.append(label)
                yield tf.stack(batch_images), tf.stack(batch_labels)

    def load_and_preprocess_image(image_path):
        # Load and preprocess your image according to your needs
        # Example implementation using tf.keras.preprocessing.image:
        img = tf.keras.preprocessing.image.load_img(image_path, color_mode='grayscale', target_size=(224, 224))
        img = tf.keras.preprocessing.image.img_to_array(img)
        img = img / 255.0  # Normalize the image
        return img
    model.fit(
        train_data_gen,
        steps_per_epoch=num_steps,
        epochs=150,
        validation_data=val_data_gen,
        validation_steps=len(val_paths) // batch_size,
        callbacks=[checkpoint_callback, early_stopping_callback, tensorboard_callback]
    )

I'm using generators because if I pass both datasets directly to model.fit() the training is killed by The OOM killer as soon as something gets out of memory.

Screenshot of error

Interestingly, when my number of epochs are less than 8 (tried out of curiosity), this error is not received and training is completed successfully. But when I increase it, the error starts occuring again.

I have tried every solution: asking ChatGPT, creating a new virtual environment, updating libraries, but none of it worked.

0

There are 0 best solutions below