Why does:
with torch.no_grad():
w = w - lr*w.grad
print(w)
results in:
tensor(0.9871)
and
with torch.no_grad():
w -= lr*w.grad
print(w)
results in:
tensor(0.9871, requires_grad=True)
Aren't both operations the same?
Here is some test code:
def test_stack():
np.random.seed(0)
n = 50
feat1 = np.random.randn(n, 1)
feat2 = np.random.randn(n, 1)
X = torch.tensor(feat1).view(-1, 1)
Y = torch.tensor(feat2).view(-1, 1)
w = torch.tensor(1.0, requires_grad=True)
epochs = 1
lr = 0.001
for epoch in range(epochs):
for i in range(len(X)):
y_pred = w*X[i]
loss = (y_pred - Y[i])**2
loss.backward()
with torch.no_grad():
#w = w - lr*w.grad # DOESN'T WORK!!!!
#print(w); return
w -= lr*w.grad
print(w); return
w.grad.zero_()
Remove the comments and you'll se the requires_grad disappearing. Could this be a bug?
I had the same issue and it boggled me. I asked chatGPT, and it turns out that normal subtraction creates a new tensor with
requires_gradset toFalse, while augmented assignment works in-place, retaining therequires_gradproperty.Let's see with an example
We will track the
idof the object via theid()function, which returns an integer that's unique for every object in memory.Normal subtraction
requires_gradset toFalsexhandle has no effect on whether or not a new object gets created.Now let's see augmented assignment
Now, with augmented assignment, the subtraction is in place. This means the old object is modified, without having to create a new one. Because of that, the ids before and after subtraction remains the same, because x still references the same object.
But wait, why do subtraction and augmented assignment work differently?
This is because they can be implemented using different dunder methods. I think this forum can explain well. But, the gist is, python understands these operators differently. They're not just syntactic sugar. This is why there's a discrepancy between the two seemingly identical operations.