Stuck while iterate through Dataloader

61 Views Asked by At

I am trying to create a 3D testing dataset with 10000 samples (8000 for train and 2000 for validation) to test out my 3D CNN model. Looks all good untill I try to look at the 1st batch of data in my Train_dataloader using next(iter(Train_dataloader)). More specifically, it seems that I run into an infinite loop, i.e., the kernel never stops.

Here is my custom dataset and how I put them into Train_dataloader and Test_dataloader:

class Binary3DDataset(Dataset):
    def __init__(self, data,transform=None):
        self.data = data
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        sample =self.transform(sample)
        return sample
 

# Define data augmentation
data_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomAffine(degrees = 15, translate = (0.1,0.1)),
    transforms.ToTensor(),
])

# create 10000 random 3D sample with dimension [1, 32,32,32] "[#of channel, depth, height,width]"
num_samples = 10000  
voxel_sample = np.random.choice([0, 1], size=(10000,1,32,32,32), p=[0.7, 0.3])

# Create an instance of custom dataset with data augmentation
augumented_custom_dataset = Binary3DDataset(voxel_sample,transform = data_transform)

# Create train and test dataloader to iterate over the augmented data
batch_size = 32
train_dataset, test_dataset = random_split(augumented_custom_dataset, [8000,2000])

Train_dataloader = DataLoader(train_dataset, batch_size = batch_size, shuffle = True,num_workers=1)
Test_dataloader = DataLoader(test_dataset,batch_size = batch_size, shuffle = False,num_workers=1)

When I try to iterate and sample the very first datapoint in the Train_dataloader using the following python iter and next functions, it seems get into an infinite loop type of scenario...

# Get one batch from the train_dataloader
data_iter = iter(Train_dataloader)
inputs = next(data_iter)

I tried to directly output the lenth of Train_dataloader and Test_dataloader. There are no problem about it. Only when I tried to iterate through the dataloader, problems appear.

len(train_dataset),len(test_dataset),len(Train_dataloader),len(Test_dataloader)

output:

(8000, 2000, 250, 63)

That means we do have length information for both dataloaders. Could not figure out why I run into infinite loop problem while iter through the dataloader.

1

There are 1 best solutions below

0
On

I had the same problem. Solved it by removing num_workers from my code.