Probable issue with LSTM in lasagne

223 Views Asked by At

With a simple constructor for the LSTM, as given in the tutorial, and an input of dimension [,,1] one would expect to see an output of shape [,,num_units]. But regardless of the num_units passed during construction, the output has the same shape as the input.

Following is the min code to replicate this issue...

    import lasagne
    import theano
    import theano.tensor as T
    import numpy as np

    num_batches= 20
    sequence_length= 100
    data_dim= 1
    train_data_3= np.random.rand(num_batches,sequence_length,data_dim).astype(theano.config.floatX)

    #As in the tutorial
    forget_gate = lasagne.layers.Gate(b=lasagne.init.Constant(5.0))
    l_lstm = lasagne.layers.LSTMLayer(
                                     (num_batches,sequence_length, data_dim), 
                                     num_units=8,
                                     forgetgate=forget_gate
                                     )

    lstm_in= T.tensor3(name='x', dtype=theano.config.floatX)

    lstm_out = lasagne.layers.get_output(l_lstm, {l_lstm:lstm_in})
    f = theano.function([lstm_in], lstm_out)
    lstm_output_np= f(train_data_3)

    lstm_output_np.shape
    #= (20, 100, 1)

An unqualified LSTM (I mean in its default mode) should produce one output for each unit right? The code was run on kaixhin's cuda lasagne docker image docker image What gives? Thanks !

1

There are 1 best solutions below

0
On

You can fix that by using a lasagne.layers.InputLayer

import lasagne
import theano
import theano.tensor as T
import numpy as np

num_batches= 20
sequence_length= 100 
data_dim= 1
train_data_3= np.random.rand(num_batches,sequence_length,data_dim).astype(theano.config.floatX)

#As in the tutorial
forget_gate = lasagne.layers.Gate(b=lasagne.init.Constant(5.0))
input_layer = lasagne.layers.InputLayer(shape=(num_batches, # <-- change
              sequence_length, data_dim),)  # <-- change
l_lstm = lasagne.layers.LSTMLayer(input_layer,  # <-- change
                                 num_units=8,
                                 forgetgate=forget_gate
                                 )

lstm_in= T.tensor3(name='x', dtype=theano.config.floatX)

lstm_out = lasagne.layers.get_output(l_lstm, lstm_in)  # <-- change
f = theano.function([lstm_in], lstm_out)
lstm_output_np= f(train_data_3)

print lstm_output_np.shape

If you feed your input into the input_layer, it is not ambiguous anymore, so you do not even need to specify where the input is supposed to go. Directly specifying a shape and adding the tensor3 into the LSTM does not work.