How should I reshape my data to feed into pytorch GRU network?

Question

How should I reshape my data to feed into pytorch GRU network?

373 Views Asked by Hubert Rybka At 05 April 2025 at 02:47

I've been having problems getting my data to fit the dimensions required by pytorch GRU.

My input is a 256-long float vector, in bathes of 64, so the size of a batch tensor is [64, 256]
According to pytorch documentation, GRU takes input of size [batch_size, sequence_length, input_size]. Now i'm not sure if the sequence_length corresponds to the length of the output sequence, nor am I sure what the input_size would be here (256?).
My GRU is supposed to take the whole vector as an input, generate output, and pass the output to the next gru cell as input. This is ought to continue until a sequence of 128 outputs is generated. My idea for the GRU network (see the picture)
Each of the outputs will be passed through 256 -> 42 fc layer and a token from the alphabet of 42 will be chosen.

What this network is going to do is take a 256-long encoded vector representation of a molecule and learn to generate the corresponding SELFIES string (text-based molecule representation), padded to the length of 128, with tokens from an alphabet of 42 'letters'.

Now, i have no idea how to reshape the input tensor for the GRU to accept it as an input, according to the drawing I attached.

Thanks in advance for your help.

I tried to torch.unsqueeze(1) the input tensor. This resulted in me getting an output of shape [64, 1, 256] which would be a batch of 64 one-token outputs in my model.

class DecoderNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, output_len):
        super(DecoderNet, self).__init__()
        
        # GRU parameters
        self.input_size = input_size # = 256
        self.hidden_size = hidden_size # = 256
        self.num_layers = num_layers # = 1
        
        # output token count
        self.output_size = output_size # = 42
        
        # output length or GRU time steps count
        self.output_len = output_len # = 128
        
        # pytorch.nn
        self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=2)
        self.relu = nn.ReLU()

    def forward(self, x, h):
        out, h = self.gru(x, h)
        return out, h
    
    def init_hidden(self, batch_size):
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size)
        return h0

Original Q&A

There are 1 best solutions below

**Christian** · Accepted Answer

By default, nn.GRU expects (seq_len, batch_size, input_size) as input. You need to create the layer with batch_first=True to give it (batch_size, seq_len, input_size).

If your x has a shape of (batch_size, seq_len), then you first need to add the inputs size dimensions with

x = x.unsqueeze(2)

to get a shape of (batch_size, seq_len, input_size=1).

Alternatively, you can keep batch_first=False (the default) and swap the batch size and sequence length dimension, before or after the unsqueeze() like that:

x = x.transpose(1, 0)

Important: Do not use reshape() or view() to "fix" the shape of x (as indicated by the title question of your post), as this will mess up your tensor!

How should I reshape my data to feed into pytorch GRU network?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DEEP-LEARNING

Related Questions in PYTORCH

Related Questions in GRU

Trending Questions

Popular # Hahtags

Popular Questions