pytorch's augmented assignment and requires_grad

Question

pytorch's augmented assignment and requires_grad

88 Views Asked by Tony Power At 01 February 2023 at 18:54

Why does:

with torch.no_grad():
     w = w - lr*w.grad
     print(w)

results in:

tensor(0.9871)

and

with torch.no_grad():
     w -= lr*w.grad
     print(w)

results in:

tensor(0.9871, requires_grad=True)

Aren't both operations the same?

Here is some test code:

def test_stack(): 
    np.random.seed(0)
    n = 50
    feat1 = np.random.randn(n, 1)
    feat2 = np.random.randn(n, 1)
    
    X = torch.tensor(feat1).view(-1, 1)
    Y = torch.tensor(feat2).view(-1, 1)
    
    w = torch.tensor(1.0, requires_grad=True)
    
    epochs = 1
    lr = 0.001
    
    for epoch in range(epochs):
        for i in range(len(X)):
            y_pred = w*X[i]
            loss = (y_pred - Y[i])**2
            loss.backward()
            
            with torch.no_grad():
                #w = w - lr*w.grad  # DOESN'T WORK!!!!
                #print(w); return
                w -= lr*w.grad
                print(w); return

                w.grad.zero_()

Remove the comments and you'll se the requires_grad disappearing. Could this be a bug?

Original Q&A

There are 1 best solutions below

**Hazem Khairy** · Answer 1 · 2023-05-09T11:54:45.320000

I had the same issue and it boggled me. I asked chatGPT, and it turns out that normal subtraction creates a new tensor with requires_grad set to False, while augmented assignment works in-place, retaining the requires_grad property.

Let's see with an example

We will track the id of the object via the id() function, which returns an integer that's unique for every object in memory.

Normal subtraction

import torch
x = torch.tensor(5.0, requires_grad = True)
id1 = id(x) # the id for the tensor object referenced by x
y = torch.tensor(3.0)
x = x - y
id2 = id(x) # the id for the new tensor object referenced by x
print(id1 == id2) # prints False
print(x.requires_grad) # prints False

The reason why the ids are different is because the subtraction operation returns a different tensor object, with requires_grad set to False
Using the old x handle has no effect on whether or not a new object gets created.
Using the old handle only means that we no longer have a reference to the old tensor object, and it will be garbage collected.

Now let's see augmented assignment

import torch
x = torch.tensor(5.0, requires_grad = True)
id1 = id(x) # the id for the tensor object referenced by x
y = torch.tensor(3.0)
x -= y
id2 = id(x)
print(id1 == id2) # prints True
print(x.requires_grad) # prints True

Now, with augmented assignment, the subtraction is in place. This means the old object is modified, without having to create a new one. Because of that, the ids before and after subtraction remains the same, because x still references the same object.

But wait, why do subtraction and augmented assignment work differently?

This is because they can be implemented using different dunder methods. I think this forum can explain well. But, the gist is, python understands these operators differently. They're not just syntactic sugar. This is why there's a discrepancy between the two seemingly identical operations.

pytorch's augmented assignment and requires_grad

There are 1 best solutions below

Let's see with an example

Normal subtraction

Now let's see augmented assignment

But wait, why do subtraction and augmented assignment work differently?

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in AUGMENTED-ASSIGNMENT

Trending Questions

Popular # Hahtags

Popular Questions