Neural Network parameters are not being updated

491 Views Asked by At

I tried to train a multi modal model on the 2d heat equation.

CONTEXT : The best I have is a CNN with a 5*5 kernel that is optimised to output temperature maps with a given diffusion coefficient. Now I try to give the model other coefficient and feed it to a simple feedforward network in order to find a way for any diffusion coeficient to access to the proper kernel that will give the right output temperature map.

PROBLEME : The 2 linear layers are not being optimized, the value of their parameters are unchanged during the training process.

Pytorch CODE to expose the issue :

from load_model import *
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

n_train = 15

input_folder = f"data/ALLMAPS/iteration_no0"
output_folder = f"data/ALLMAPS/iteration_no10"
dataset = HeatDiffusion_multi_alpha_random(input_folder, output_folder)

n_samples = len(dataset)
n_test = n_samples-n_train
train_set, test_set = torch.utils.data.random_split(dataset, [n_train,n_test])

class FrozenConv2d(nn.Conv2d):
    def __init__(self):
        super().__init__(in_channels=1, out_channels=1, kernel_size=(5, 5), padding=2, padding_mode='replicate',
                         bias=False)
        self.weight.requires_grad = False  # freeze the convolution kernel
        # self.bias.requires_grad = False

    def forward(self, x):
        out = nn.functional.conv2d(x, self.weight, bias=None, padding=2)  # , self.bias)
        return out
conv_layer = FrozenConv2d()

class Smart(nn.Module):
    def __init__(self):
        super(Smart, self).__init__()
        self.l1 = nn.Linear(1,5)
        self.l2 = nn.Linear(5,25)
        self.act = nn.ReLU()

    def forward(self, x):
        alpha = x[1]
        alpha = alpha.view(-1,1)
        pre_kernel = self.act((self.l1(alpha)))
        kernel = self.l2(pre_kernel).view(1,1,5,5)

        conv_layer.weight = nn.Parameter(kernel) #requires_grad is set to true otherwise the backward is not working

        image = x[0].view(-1, 1, 100, 100)
        out = self.act((conv_layer(image)))
        out = out.view(100,100)

        return out

model = Smart().to(device)
model.train()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.003) #here model.parameters() only contains l1 and l2 which I want to optimize

for inputs, true_outputs in train_set:
    optimizer.zero_grad()
    inputs = [inp.to(device) for inp in inputs]
    true_outputs = true_outputs.to(device)

    # forward
    pred_outputs = model(inputs)
    loss = criterion(pred_outputs, true_outputs)

    # backwards
    loss.backward()
    optimizer.step()

print('loss', loss.item())
layers1 = [x.data for x in model.parameters()]

for inputs, true_outputs in train_set:
    optimizer.zero_grad()
    inputs = [inp.to(device) for inp in inputs]
    true_outputs = true_outputs.to(device)

    # forward
    pred_outputs = model(inputs)
    loss = criterion(pred_outputs, true_outputs)

    # backwards
    loss.backward()
    optimizer.step()

print('loss', loss.item()) #the loss is exactly the same
layers2 = [x.data for x in model.parameters()] # the parameters are exactly the same

(I set one of the linear weight because I know that these numbers works well for a certain coefficient.)

I check the values of my parameters with layers=[x.data for x in model.parameters() these values doesn't change, as well as the loss during training (exactly the same at any decimal)

I also checked if the parameters still had requires_grad = True and they have, nothing wrong here.

But the model isn't opimizing itself...

I suspect that the graph can't be built because of the use of the Conv2d but I have no idea on how to solve the probleme.

PS: I didn't use batching because Conv2d layers are not meant to have different kernel for each samples of the batch. So for now the only way to try my architecture was to use dataset instead of dataloader

3

There are 3 best solutions below

1
On

Your model does not train because you called optimizer.zero_grad() after running your training samples through the model. This basically means you cleared out all the gradient you accumulated during training before doing backward propagation via loss.backward(), hence no training occurred. To fix it simply move optimizer.zero_grad() to the start of your forward function.

    model.train()
    for epoch in range(num_epochs):
        for inputs, true_outputs in train_set:

            ### zero out gradient before running tensort through model
            optimizer.zero_grad()

            inputs = [inp.to(device) for inp in inputs]
            true_outputs = true_outputs.to(device)

            #forward
            pred_outputs = model(inputs)
            loss = criterion(pred_outputs, true_outputs)

            # backwards
            loss.backward()
            optimizer.step()
            if tensorboard:
                writer.add_scalar('Training loss', loss, global_step=step)
                step +=1
        if schedule:
            scheduler.step()

        if (epoch+1) % epoch_step_to_print == 0 and verbose:
            print(f'epoch {epoch+1} / {num_epochs}, loss = {loss.item():.6f}')

    return model
1
On

First possible cause (probably too obvious) would be that schedule is False.

Without having a clear solution, I see a few possible things to explore:

  • avoid creating modules in the forward (as in nn.Relu(), FrozenConv2D() etc) as it might be unnecessarily costly and maybe they get created on different devices which could cause backward issues (?) -> prefer the functional interface, or create these modules once in the constructor,
  • maybe the absence of batch dimension (could be set to 1, incl. in the output) makes the criterion behave unexpectedly
0
On

Finally, freezing the conv layer wasn't a good idea. The best solution is to use nn.functionnal.convd(input, kernel).

There will still be an issue as far as batching is concerned but we can bypass it by doing a for loop on each sample of the batch.

Python code :

class Smart(nn.Module):
    def __init__(self, output_size=10000, xdim=100):
        super(Smart, self).__init__()
        self.xdim = xdim
        self.ydim = int(output_size/xdim)
        self.l1 = nn.Linear(1,5)
        self.l2 = nn.Linear(5,25)
        self.act = nn.ReLU()

    def forward(self, x):
        alpha = x[1]
        alpha = alpha.view(-1,1) #reformat the alpha coefficients of the batch

        pre_kernel = self.act((self.l1(alpha)))
        kernel = self.l2(pre_kernel).view(1,1,5,5)

        image = x[0].view(-1, 1, self.ydim, self.xdim)  # reformat the images of the batch
        out = self.act((nn.functional.conv2d(image, kernel, padding=2)))
        out = out.view(self.ydim,self.xdim)
        return out

model = Smart().to(device)

#loss and optimizer, scheduler, writer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

#training loop
model.train()
for epoch in range(num_epochs):
    for inputs, true_outputs in train_set:
        optimizer.zero_grad()
        inputs = [inp.to(device) for inp in inputs]
        true_outputs = true_outputs.to(device)

        #forward
        pred_outputs = model(inputs)
        loss = criterion(pred_outputs, true_outputs)

        #backwards

        loss.backward()
        optimizer.step()

    if (epoch+1) % epoch_step_to_print == 0 and verbose:
        print(f'epoch {epoch+1} / {num_epochs}, loss = {loss.item():.6f}')