How should I reshape my data to feed into pytorch GRU network?

373 Views Asked by At

I've been having problems getting my data to fit the dimensions required by pytorch GRU.

  • My input is a 256-long float vector, in bathes of 64, so the size of a batch tensor is [64, 256]
  • According to pytorch documentation, GRU takes input of size [batch_size, sequence_length, input_size]. Now i'm not sure if the sequence_length corresponds to the length of the output sequence, nor am I sure what the input_size would be here (256?).
  • My GRU is supposed to take the whole vector as an input, generate output, and pass the output to the next gru cell as input. This is ought to continue until a sequence of 128 outputs is generated. My idea for the GRU network (see the picture)
  • Each of the outputs will be passed through 256 -> 42 fc layer and a token from the alphabet of 42 will be chosen.

What this network is going to do is take a 256-long encoded vector representation of a molecule and learn to generate the corresponding SELFIES string (text-based molecule representation), padded to the length of 128, with tokens from an alphabet of 42 'letters'.

Now, i have no idea how to reshape the input tensor for the GRU to accept it as an input, according to the drawing I attached.

Thanks in advance for your help.

I tried to torch.unsqueeze(1) the input tensor. This resulted in me getting an output of shape [64, 1, 256] which would be a batch of 64 one-token outputs in my model.

class DecoderNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, output_len):
        super(DecoderNet, self).__init__()
        
        # GRU parameters
        self.input_size = input_size # = 256
        self.hidden_size = hidden_size # = 256
        self.num_layers = num_layers # = 1
        
        # output token count
        self.output_size = output_size # = 42
        
        # output length or GRU time steps count
        self.output_len = output_len # = 128
        
        # pytorch.nn
        self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=2)
        self.relu = nn.ReLU()

    def forward(self, x, h):
        out, h = self.gru(x, h)
        return out, h
    
    def init_hidden(self, batch_size):
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size)
        return h0
1

There are 1 best solutions below

0
On BEST ANSWER

By default, nn.GRU expects (seq_len, batch_size, input_size) as input. You need to create the layer with batch_first=True to give it (batch_size, seq_len, input_size).

If your x has a shape of (batch_size, seq_len), then you first need to add the inputs size dimensions with

x = x.unsqueeze(2)

to get a shape of (batch_size, seq_len, input_size=1).

Alternatively, you can keep batch_first=False (the default) and swap the batch size and sequence length dimension, before or after the unsqueeze() like that:

x = x.transpose(1, 0)

Important: Do not use reshape() or view() to "fix" the shape of x (as indicated by the title question of your post), as this will mess up your tensor!