Running Neptune.ai in a loop

163 Views Asked by At

so i created a for loop so I can run various batch sizes, where each loop will open and close a neptune run. The first time runs fine, but the following runs, the accuracy doesn't record into neptune, and python does not throw an error? Can anyone think what the problem may be?

for i in range(len(percentage)):

    run = neptune.init(
        project="xxx",
        api_token="xxx",
    )

    epochs = 600
    batch_perc = percentage[i]
    lr = 0.001
    sb = 64 #round((43249*batch_perc)*0.00185)
    params = {
        'lr': lr,
        'bs': sb,
        'epochs': epochs,
        'batch %': batch_perc
    }
    run['parameters'] = params

    torch.manual_seed(12345)
    td = 43249 * batch_perc
    vd = 0.1*(43249 - td) + td

    train_dataset = dataset[:round(td)]
    val_dataset = dataset[round(td):round(vd)]
    test_dataset = dataset[round(vd):]

    print(f'Number of training graphs: {len(train_dataset)}')
    run['train'] = len(train_dataset)
    print(f'Number of validation graphs: {len(val_dataset)}')
    run['val'] = len(val_dataset)
    print(f'Number of test graphs: {len(test_dataset)}')
    run['test'] = len(test_dataset)

    train_loader = DataLoader(train_dataset, batch_size=sb, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=sb, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False)

    model = GCN(hidden_channels=64).to(device)

    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = torch.nn.CrossEntropyLoss()

    for epoch in range(1, epochs):
        train()
        train_acc = test(train_loader)
        run['training/batch/acc'].log(train_acc)
        val_acc = test(val_loader)
        run['training/batch/val'].log(val_acc)
1

There are 1 best solutions below

1
On

Prince here,

Try using the stop() method to kill the previous run, because currently, you are creating new run objects without killing them, and that might cause some problems.

for i in range(len(percentage)):

    run = neptune.init(
        project="xxx",
        api_token="xxx",
    )
    run['parameters'] = params

    run['train'] = len(train_dataset)
    run['val'] = len(val_dataset)
    run['test'] = len(test_dataset)
    ...

    for epoch in range(1, epochs):
        ...
        run['training/batch/acc'].log(train_acc)
        run['training/batch/val'].log(val_acc)

    run.stop()

Docs: https://docs.neptune.ai/api-reference/run#.stop