PyTorch: Why create multiple instances of the same type of layer?

1.7k Views Asked by At

This code is from PyTorch transformer:

    self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs)
    self.dropout = Dropout(dropout)
    self.linear2 = Linear(dim_feedforward, d_model, **factory_kwargs)
    self.norm1 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
    self.norm2 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
    self.norm3 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
    self.dropout1 = Dropout(dropout)
    self.dropout2 = Dropout(dropout)
    self.dropout3 = Dropout(dropout)

Why do they add self.dropout1, ...2, ...3 when self.dropout already exists and is the exact same function?

Also, what is the difference between (self.linear1, self.linear2) and self.linear?

2

There are 2 best solutions below

3
On BEST ANSWER

That's because to separate one Linear layer or Dropout layer from one another. That's very simple logic. You are creating different instances or layers in the network of the Dropout function using self.dropout = Dropout(dropout).

0
On

In the case of Dropout, reusing the layer should not usually be an issue. So you could create a single self.dropout = Dropout(dropout) layer and call it multiple times in the forward function. But there may be subtle use cases which would behave differently when you do this, such as if you iterate across layers in a network for some reason. This thread, and particularly this post, discuss this in some detail.

For the linear layer, each Linear object is characterized by a set of weights and biases. If you call it multiple times in the forward function, all the calls will share and optimize the same set of weights. This can have legitimate uses, but is not appropriate when you want multiple linear layers, each with its own set of weights and biases.