Dataloader with shuffle=False, but images order change in each epoch

77 Views Asked by At

Even though I use 'shuffle=False' the images randomized each epoch.

Here is the code for creating the loaders:

data_set = dset.CIFAR10(root='./data/cifar10', train=True, transform=transform, download=True)
train_loader, test_loader = create_loader_from_data_set(data_set, n_samples, batch_size, num_workers)

def create_loader_from_data_set(data_set, n_samples, batch_size, num_workers, test_size=0.2):
    indices = list(range(len(data_set)))
    selected_indices = random.sample(indices, n_samples)

    train_indices, test_indices = train_test_split(selected_indices, test_size=test_size, random_state=42)

    train_sampler = SubsetRandomSampler(train_indices)
    test_sampler = SubsetRandomSampler(test_indices)

    train_loader = DataLoader(data_set, batch_size=batch_size, num_workers=num_workers, sampler=train_sampler, shuffle=False)
    test_loader = DataLoader(data_set, batch_size=batch_size, num_workers=num_workers, sampler=test_sampler, shuffle=False)
    return train_loader, test_loader

And this for the training loop:

def train_epoch(epoch, network, loader, optimizer, batch_size):
    network.train()
    for batch_index, sample_tensor in enumerate(loader):
        batch_images, _ = sample_tensor 

I get different order of the images in each epoch (not the same batches also). shuffle=False shouldn't keep the order the same?

Thanks!

I tried also with generator but it didn't work:

gen = torch.Generator()

train_loader = DataLoader(data_set, batch_size=batch_size, num_workers=num_workers, sampler=train_sampler, generator=gen)
1

There are 1 best solutions below

2
On

You should try train_test_split(..., shuffle = False) because this function's default value is True.

reference -> https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split