Loading image bounding boxes outputs equal size error

35 Views Asked by At

I am trying to create a PyTorch dataloader for my dataset. Each image has a certain number of cars and a bounding box for each of them, not all images have the same amount of bounding boxes.

You probably wont be able to run it, but here is some info. This is my data loader

class AGR_Dataset(Dataset):
    def __init__(self, annotations_root, img_root, transform=None):
        """
        Arguments:
            annotations_root (string): Path to the csv file with annotations.
            img_root (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.annotations_root = annotations_root
        self.img_root = img_root
        self.transform = transform

    def __len__(self):
        return len(self.annotations_root)
    
    def __getitem__(self, idx):
        # idx may be the index or image name, I think image naem
        if torch.is_tensor(idx):
            idx = idx.tolist()
        
        idx_name = os.listdir(self.img_root)[idx]
        # print(idx_name)
        
        img_name = os.path.join(self.img_root, idx_name)
        annotation_data = os.path.join(self.annotations_root, f"{idx_name.removesuffix('.jpg')}.txt")
        # print(img_name, annotation_data)

        image = io.imread(img_name)

        with open(annotation_data, 'r') as file:
            lines = file.readlines()
            img_data = []
            img_labels = []
            for line in lines:
                line = line.split(',')
                line = [i.strip() for i in line]
                line = [float(num) for num in line[0].split()]
                img_labels.append(int(line[0]))
                img_data.append(line[1:])

        boxes = tv_tensors.BoundingBoxes(img_data, format='CXCYWH', canvas_size=(image.shape[0], image.shape[1]))

        # sample = {'image': image, 'bbox': boxes, 'labels': img_labels}
        sample = {'image': image, 'bbox': boxes}

        if self.transform:
            sample = self.transform(sample)

        print(sample['image'].shape)
        print(sample['bbox'].shape)
        # print(sample['labels'].shape)
        return sample

I run my transforms and create the dataloader

data_transform = v2.Compose([
    v2.ToImage(),
    # v2.Resize(680),
    v2.RandomResizedCrop(size=(680, 680), antialias=True),
    # v2.ToDtype(torch.float32, scale=True),
    v2.ToTensor()
])

transformed_dataset = AGR_Dataset(f'{annotations_path}/test/', 
                        f'{img_path}/test/',
                        transform=data_transform)

dataloader = DataLoader(transformed_dataset, batch_size=2,
                        shuffle=False, num_workers=0)

Then I am trying to iterate through it with this, and eventually view and image with the bounding boxes.

for i, sample in enumerate(dataloader):
    print(i, sample)
    print(i, sample['image'].size(), sample['bbox'].size())

    if i == 4:
        break

With a batch size of 1, it runs properly, with a batch size of 2, I get this error

torch.Size([3, 680, 680])
torch.Size([12, 4])

torch.Size([3, 680, 680])
torch.Size([259, 4])

RuntimeError: stack expects each tensor to be equal size, but got [12, 4] at entry 0 and [259, 4] at entry 1
  1. I believe it is due to the number of bounding boxes not being equal, but how do I overcome this?
  2. Do I need the ToTensor in my transforms? I am starting to think I don't as v2 uses ToImage(), and ToTensor is becoming depreciated.

Any other comments or help would be appreciated. I am not sure how to create a working example, I will continue to try.

What I have tried I have tried not loading the bounding boxes in as tensors, by commenting the tv_tensors.BoundingBoxes line in the dataloader, but then for some reason my resize doesnt work properly.

I just tried splitting bboxes and images like this in the dataloader

sample = image
    target = {'bbox': boxes, 'labels': img_labels}

No luck with that

1

There are 1 best solutions below

1
Conner Carriere On

I have found an answer to the problem.

In the dataloader, the collate_fn needs to be set to the collate_fn in the utils package that torchvision has!