Is it possible to output a specific size of tensors in 'pixel_values' with a transform using HF's Dataset class?

36 Views Asked by At

I am trying to adapt a pretrained ViT to work with 3D images, using a naive approch with a maxpooling layer to aggregate extracted features before the MLP head. I want to try to use the Trainer class to train the model, so i am using HF's Dataset class with a transform to process each slice of the 3D image, but i cannot return the whole set of processed slices, the transform keeps returning only one processed slice.

Here, processor is an instance of ViTImageProcessor.

def preprocess_data(ds, num_slices=28):
    reshaped = [np.array(sample).reshape(1, 28, 28, 28) for sample in ds['image']]
    inputs = processor(
        [np.repeat(sample, 3, axis=0)[:, :, :, i] for sample in reshaped for i in range(num_slices)],
        return_tensors='pt'
    )

    labels = []
    for y in ds['labels']:
        labels.append(y)
    inputs['labels'] = labels

    return inputs

the 'pixel_values' tensor should have a size of (28, 3, 224, 224), and i verified this in the transform function printing the shape of this tensor, but then when i get a sample from the dataset with the transform applied i get a tensor of size (3, 224, 224). I tried to pile the information of the slices in another dimension, but then the first dimension gets ignores. Why does the transform behaves likes this?

I am tried to use my own training loop aswell, but my model keeps failing to converge, so i really want to try and use a Trainer instance.

Thanks.

0

There are 0 best solutions below