Influence of Unused FFN on Model Accuracy in PyTorch

34 Views Asked by At

Hello Stack Overflow community,

I am encountering a peculiar issue with my PyTorch model where the presence of an initialized but unused FeedForward Network (FFN) affects the model's accuracy. Specifically, when the FFN is initialized in my CRS_A class but not used in the forward pass, my model's accuracy is higher compared to when I completely remove (or comment out) the FFN initialization. The FFN is defined as follows in my model's constructor:

class CRS_A(nn.Module):
    def __init__(self, modal_x, modal_y, hid_dim=128, d_ff=512, dropout_rate=0.1):
        super(CRS_A, self).__init__()

        self.cross_attention = CrossAttention(modal_y, modal_x, hid_dim)
        self.ffn = nn.Sequential(
            nn.Conv1d(modal_x, d_ff, kernel_size=1),
            nn.GELU(),
            nn.Dropout(dropout_rate),
            nn.Conv1d(d_ff, 128, kernel_size=1),
            nn.Dropout(dropout_rate),
        )
        self.norm = nn.LayerNorm(modal_x)
       
        self.linear1 = nn.Conv1d(1024, 512, kernel_size=1)
        self.linear2 = nn.Conv1d(512, 300, kernel_size=1)
        self.dropout1 = nn.Dropout(0.1)
        self.dropout2 = nn.Dropout(0.1)

    def forward(self, x, y, adj):
        x = x + self.cross_attention(y, x, adj)  #torch.Size([5, 67, 1024])
        x = self.norm(x).permute(0, 2, 1)
        x = self.dropout1(F.gelu(self.linear1(x))) #torch.Size([5, 512, 67])
        x_e = self.dropout2(F.gelu(self.linear2(x))) #torch.Size([5, 300, 67])

        return x_e, x

As you can see, the self.ffn is not used in the forward pass. Despite this, removing or commenting out the FFN's initialization leads to a noticeable drop in accuracy.

Could this be due to some form of implicit regularization, or is there another explanation for this behavior? Has anyone encountered a similar situation, and how did you address it? Any insights or explanations would be greatly appreciated.

0

There are 0 best solutions below