model.predict() always returns a single array with length step count

49 Views Asked by At

I am trying to create a model for this Kaggle competition. A lot of the code that I am using here is from the getting started notebook in that competition. For extra context, I am using Tensorflow 2.15.0 and I am using the Kaggle notebook to write and run this code.

To summarize, the dataset I am using involves a series of images of 104 different species of flowers loaded in from TFRecord files split into training, validation, and test sets.

When pre-processing the training data, I duplicate the dataset a few times and apply data augmentation to all the images in that dataset.

For my model, I am using the Xception model pre-trained on imagenet modified to have a classifier with 104 categories at the end of it.

The model trains fine although with some weird errors: model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node StatefulPartitionedCall. model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node AssignVariableOp.

import math, re, os
import numpy as np
import tensorflow as tf
import matplotlib as plt

print("Tensorflow version " + tf.__version__)

# Detect TPU, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy()

print("REPLICAS: ", strategy.num_replicas_in_sync)

IMAGE_SIZE = [512, 512]
GCS_PATH = "/kaggle/input/tpu-getting-started" + '/tfrecords-jpeg-512x512'
AUTO = tf.data.experimental.AUTOTUNE

TRAINING_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/train/*.tfrec')
VALIDATION_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/val/*.tfrec')
TEST_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/test/*.tfrec')

CLASSES = ['pink primrose',    'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea',     'wild geranium',     'tiger lily',           'moon orchid',              'bird of paradise', 'monkshood',        'globe thistle',         # 00 - 09
           'snapdragon',       "colt's foot",               'king protea',      'spear thistle', 'yellow iris',       'globe-flower',         'purple coneflower',        'peruvian lily',    'balloon flower',   'giant white arum lily', # 10 - 19
           'fire lily',        'pincushion flower',         'fritillary',       'red ginger',    'grape hyacinth',    'corn poppy',           'prince of wales feathers', 'stemless gentian', 'artichoke',        'sweet william',         # 20 - 29
           'carnation',        'garden phlox',              'love in the mist', 'cosmos',        'alpine sea holly',  'ruby-lipped cattleya', 'cape flower',              'great masterwort', 'siam tulip',       'lenten rose',           # 30 - 39
           'barberton daisy',  'daffodil',                  'sword lily',       'poinsettia',    'bolero deep blue',  'wallflower',           'marigold',                 'buttercup',        'daisy',            'common dandelion',      # 40 - 49
           'petunia',          'wild pansy',                'primula',          'sunflower',     'lilac hibiscus',    'bishop of llandaff',   'gaura',                    'geranium',         'orange dahlia',    'pink-yellow dahlia',    # 50 - 59
           'cautleya spicata', 'japanese anemone',          'black-eyed susan', 'silverbush',    'californian poppy', 'osteospermum',         'spring crocus',            'iris',             'windflower',       'tree poppy',            # 60 - 69
           'gazania',          'azalea',                    'water lily',       'rose',          'thorn apple',       'morning glory',        'passion flower',           'lotus',            'toad lily',        'anthurium',             # 70 - 79
           'frangipani',       'clematis',                  'hibiscus',         'columbine',     'desert-rose',       'tree mallow',          'magnolia',                 'cyclamen ',        'watercress',       'canna lily',            # 80 - 89
           'hippeastrum ',     'bee balm',                  'pink quill',       'foxglove',      'bougainvillea',     'camellia',             'mallow',                   'mexican petunia',  'bromelia',         'blanket flower',        # 90 - 99
           'trumpet creeper',  'blackberry lily',           'common tulip',     'wild rose']                                                                                                                                               # 100 - 102


def decode_image(image_data):
    image = tf.image.decode_jpeg(image_data, channels=3)
    image = tf.cast(image, tf.float32) / 255.0  # convert image to floats in [0, 1] range
    image = tf.reshape(image, [*IMAGE_SIZE, 3]) # explicit size needed for TPU
    return image

def read_labeled_tfrecord(example):
    LABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "class": tf.io.FixedLenFeature([], tf.int64),  # shape [] means single element
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = tf.cast(example['class'], tf.int32)
    return image, label # returns a dataset of (image, label) pairs

def read_unlabeled_tfrecord(example):
    UNLABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "id": tf.io.FixedLenFeature([], tf.string),  # shape [] means single element
        # class is missing, this competitions's challenge is to predict flower classes for the test dataset
    }
    example = tf.io.parse_single_example(example, UNLABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    idnum = example['id']
    return image, idnum # returns a dataset of image(s)

def load_dataset(filenames, labeled=True, ordered=False):
    # Read from TFRecords. For optimal performance, reading from multiple files at once and
    # disregarding data order. Order does not matter since we will be shuffling the data anyway.

    ignore_order = tf.data.Options()
    if not ordered:
        ignore_order.experimental_deterministic = False # disable order, increase speed

    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO) # automatically interleaves reads from multiple files
    dataset = dataset.with_options(ignore_order) # uses data as soon as it streams in, rather than in its original order
    dataset = dataset.map(read_labeled_tfrecord if labeled else read_unlabeled_tfrecord, num_parallel_calls=AUTO)
    # returns a dataset of (image, label) pairs if labeled=True or (image, id) pairs if labeled=False
    return dataset

# data augmentation model
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.Resizing(IMAGE_SIZE[0], IMAGE_SIZE[1]),
    tf.keras.layers.RandomFlip("horizontal_and_vertical"),                    # Randomly flip the image horizontally and/or vertically
    tf.keras.layers.RandomRotation(0.2),                                      # Randomly rotate the image by 20%
    tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2),   # Randomly translate the image by up to 20%
    tf.keras.layers.RandomZoom(0.2),                                          # Randomly zoom the image in or out by up to 20%
    tf.keras.layers.RandomContrast(0.2),                                      # Randomly change the contrast of the image by up to 20%
])

# image resizing
resize_image = tf.keras.Sequential([
    tf.keras.layers.Resizing(IMAGE_SIZE[0], IMAGE_SIZE[1])
])

def get_training_dataset(num_augmentations):
    loaded_dataset = load_dataset(TRAINING_FILENAMES, labeled=True)
    dataset = loaded_dataset

    # add additional images to the dataset
    for i in range(num_augmentations):
        dataset = dataset.concatenate(loaded_dataset)

    # perform data augmentation on the dataset
    dataset = dataset.map(lambda image, label: (data_augmentation(image), label), num_parallel_calls=AUTO)
    dataset = dataset.repeat() # the training dataset must repeat for several epochs
    dataset = dataset.shuffle(2048)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO) # prefetch next batch while training (autotune prefetch buffer size)
    return dataset

def get_validation_dataset(ordered=False):
    dataset = load_dataset(VALIDATION_FILENAMES, labeled=True, ordered=ordered)
    dataset = dataset.map(lambda image, label: (resize_image(image), label), num_parallel_calls=AUTO)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.cache()
    dataset = dataset.prefetch(AUTO)
    return dataset

def get_test_dataset(ordered=False):
    dataset = load_dataset(TEST_FILENAMES, labeled=False, ordered=ordered)
    dataset = dataset.map(lambda image, ids: (resize_image(image), ids), num_parallel_calls=AUTO)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

def count_data_items(filenames):
    # the number of data items is written in the name of the .tfrec
    # files, i.e. flowers00-230.tfrec = 230 data items
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) for filename in filenames]
    return np.sum(n)

NUM_TRAINING_IMAGES = count_data_items(TRAINING_FILENAMES)
NUM_VALIDATION_IMAGES = count_data_items(VALIDATION_FILENAMES)
NUM_TEST_IMAGES = count_data_items(TEST_FILENAMES)
print('Dataset: {} training images, {} validation images, {} unlabeled test images'.format(NUM_TRAINING_IMAGES, NUM_VALIDATION_IMAGES, NUM_TEST_IMAGES))

# Define the batch size. This will be 16 with TPU off and 128 (=16*8) with TPU on
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
AUGMENTATIONS = 4

ds_train = get_training_dataset(AUGMENTATIONS)
ds_valid = get_validation_dataset()
ds_test = get_test_dataset()

print("Training:", ds_train)
print ("Validation:", ds_valid)
print("Test:", ds_test)

np.set_printoptions(threshold=15, linewidth=80)

print("Training data shapes:")
for image, label in ds_train.take(3):
    print(image.numpy().shape, label.numpy().shape)
print("Training data label examples:", label.numpy())

print("Test data shapes:")
for image, idnum in ds_test.take(3):
    print(image.numpy().shape, idnum.numpy().shape)
print("Test data IDs:", idnum.numpy().astype('U')) # U=unicode string

EPOCHS = 12

with strategy.scope():
    # Xception model pre-trained on the imagenet dataset
    pretrained_model = tf.keras.applications.Xception(
        weights='imagenet',
        include_top=False,
        input_shape=[*IMAGE_SIZE, 3]
    )
    pretrained_model.trainable = True

    model = tf.keras.Sequential([
        pretrained_model,
        # ... attach a new head to act as a classifier
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(units=len(CLASSES), activation='softmax'),
    ])

    # compile model
    model.compile(
        optimizer='adam',
        loss = 'sparse_categorical_crossentropy',
        metrics=['sparse_categorical_accuracy'],
    )
model.summary()

# Learning Rate Schedule for Fine Tuning #
def exponential_lr(epoch,
                   start_lr = 0.00001, min_lr = 0.00001, max_lr = 0.00005,
                   rampup_epochs = 5, sustain_epochs = 0,
                   exp_decay = 0.8):

    def lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay):
        # linear increase from start to rampup_epochs
        if epoch < rampup_epochs:
            lr = ((max_lr - start_lr) /
                  rampup_epochs * epoch + start_lr)
        # constant max_lr during sustain_epochs
        elif epoch < rampup_epochs + sustain_epochs:
            lr = max_lr
        # exponential decay towards min_lr
        else:
            lr = ((max_lr - min_lr) *
                  exp_decay**(epoch - rampup_epochs - sustain_epochs) +
                  min_lr)
        return lr
    return lr(epoch,
              start_lr,
              min_lr,
              max_lr,
              rampup_epochs,
              sustain_epochs,
              exp_decay)

lr_callback = tf.keras.callbacks.LearningRateScheduler(exponential_lr, verbose=True)

# Define training epochs
STEPS_PER_EPOCH = NUM_TRAINING_IMAGES * AUGMENTATIONS // BATCH_SIZE

with strategy.scope():
    history = model.fit(
        ds_train,
        validation_data=ds_valid,
        epochs=EPOCHS,
        steps_per_epoch=STEPS_PER_EPOCH,
        callbacks=[lr_callback],
    )

cmdataset = get_validation_dataset(ordered=True)

images_ds = cmdataset.map(lambda image, label: image)
labels_ds = cmdataset.map(lambda image, label: label).unbatch()

cm_correct_labels = next(iter(labels_ds.batch(NUM_VALIDATION_IMAGES))).numpy()
cm_probabilities = model.predict(cmdataset, steps=NUM_VALIDATION_IMAGES//BATCH_SIZE)

print(cm_probabilities)
print(len(cm_probabilities))
print(tf.shape(cm_probabilities))

cm_predictions = np.argmax(cm_probabilities, axis=-1)
print(cm_predictions)

I then ran model.predict() on the validation dataset. What I wanted to do is to see how well the model performs when classifying each of the classes and create a confusion matrix using this data since model.evaluate() only determines how well the model performs overall. I will also be planning to use model.predict() to run the model on the test set and submit the predictions.

When running the model.predict() function, if I recall correctly, I should get a list of arrays for each image in the validation set which contain the weights for each class. However, I run into an issue where model.predict() is instead returning a single array with a length equal to the step size which there were 29 in this case (3712 validation images / 128 batch size).

[127.99999  127.999985 128.       ... 127.999985 127.999985 127.99999 ]
29
tf.Tensor([29], shape=(1,), dtype=int32)
2

I've tried different models (I did ResNet50V2, VGG16, and a basic CNN model) and they all yielded the same problem where I only get a single array of size 29. My only guess as to what is going on here is ether an issue with my classification layer or with the dataset, perhaps shape related.

0

There are 0 best solutions below