For updating a part of parameters defined by torch.nn.Parameter. I have tested the following three ways, but only one works.
#(1)
import torch
class NET(torch.nn.Module):
def __init__(self):
super(NET, self).__init__()
self.params = torch.ones(4)
self.P = torch.nn.Parameter(torch.ones(1))
self.params[1] = self.P
def forward(self, x):
y = x * self.params
return y.sum()
net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
optim.zero_grad()
loss = net(x)
loss.backward()
optim.step()
# RuntimeError: Trying to backward through the graph a second time
#(2)
import torch
class NET(torch.nn.Module):
def __init__(self):
super(NET, self).__init__()
self.P = torch.nn.Parameter(torch.ones(1))
def forward(self, x):
params = torch.ones(4)
params[1] = self.P
y = x * params
return y.sum()
net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
optim.zero_grad()
loss = net(x)
loss.backward()
optim.step()
# It works, but the operations of Create and Assign are needed in each forward.
#(3)
import torch
class NET(torch.nn.Module):
def __init__(self):
super(NET, self).__init__()
self.params = torch.nn.Parameter(torch.ones(4))
def forward(self, x):
y = x * self.params
return y.sum()
net = NET()
net.params[1].requires_grad = False
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
optim.zero_grad()
loss = net(x)
loss.backward()
optim.step()
# RuntimeError: you can only change requires_grad flags of leaf variables.
I wonder how to update a part of parameters in the ways (1) and (3).
A small note on the use of
requires_gradand nn.Parameter:If you had to freeze a sub-module of you
nn.Module, you would require the use ofrequires_grad_. However, you cannot partially require gradients on a tensor.A
nn.Parameteris a wrapper which allows a giventorch.Tensorto be registered inside ann.Module. By default, the wrapped tensor will require gradient computation.You must therefore absolutely have your parameter tensor defined as:
And not as:
Ultimately you should check the content of your registered parameters with
nn.Module#parametersbefore loading them inside an optimizer.Your first code
#1crashes because you are performing multiple backpropagations on the same tree without explicitly setting theretain_graphtoTrue. The following process works fine:Your second code
#2is correct because you are assigning the tensor which requires gradient to a different tensor. A minimal implementation to check that the gradient is indeed computed onPis as follows:Your third code
#3is invalid because you are requiring gradient computation on part of the code which is not possible:An alternative way to do it instead is by masking the gradient after the back propagation has been done on the parameters: