I am trying to create a 3D testing dataset with 10000 samples (8000 for train and 2000 for validation) to test out my 3D CNN model. Looks all good untill I try to look at the 1st batch of data in my Train_dataloader
using next(iter(Train_dataloader))
. More specifically, it seems that I run into an infinite loop, i.e., the kernel never stops.
Here is my custom dataset and how I put them into Train_dataloader
and Test_dataloader
:
class Binary3DDataset(Dataset):
def __init__(self, data,transform=None):
self.data = data
self.transform = transform
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
sample = self.data[idx]
sample =self.transform(sample)
return sample
# Define data augmentation
data_transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomAffine(degrees = 15, translate = (0.1,0.1)),
transforms.ToTensor(),
])
# create 10000 random 3D sample with dimension [1, 32,32,32] "[#of channel, depth, height,width]"
num_samples = 10000
voxel_sample = np.random.choice([0, 1], size=(10000,1,32,32,32), p=[0.7, 0.3])
# Create an instance of custom dataset with data augmentation
augumented_custom_dataset = Binary3DDataset(voxel_sample,transform = data_transform)
# Create train and test dataloader to iterate over the augmented data
batch_size = 32
train_dataset, test_dataset = random_split(augumented_custom_dataset, [8000,2000])
Train_dataloader = DataLoader(train_dataset, batch_size = batch_size, shuffle = True,num_workers=1)
Test_dataloader = DataLoader(test_dataset,batch_size = batch_size, shuffle = False,num_workers=1)
When I try to iterate and sample the very first datapoint in the Train_dataloader
using the following python iter
and next
functions, it seems get into an infinite loop type of scenario...
# Get one batch from the train_dataloader
data_iter = iter(Train_dataloader)
inputs = next(data_iter)
I tried to directly output the lenth of Train_dataloader
and Test_dataloader
. There are no problem about it. Only when I tried to iterate through the dataloader, problems appear.
len(train_dataset),len(test_dataset),len(Train_dataloader),len(Test_dataloader)
output:
(8000, 2000, 250, 63)
That means we do have length information for both dataloaders. Could not figure out why I run into infinite loop problem while iter through the dataloader.
I had the same problem. Solved it by removing
num_workers
from my code.