Seq to Seq model training

69 Views Asked by At

I have couple of questions:

  1. In a seq to seq model with varying input length, if you don't use the attention mask the RNN may end up computing the hidden state value for padded element? So thus it mean attention mask is mandatory else my output will be wrong?
  2. How to deal with varying length labels then, let's say I have padded for passing it in batch. Now I don't want my padded elements to have an impact on my loss, so how do I ignore that?
1

There are 1 best solutions below

3
On
  1. No, not necessarily. RNN takes time series and computes the Hidden state every time. you can force your RNN to stop and not to compute the hidden state value for the padded elements.

You can use Dynamic RNN for that. read about it here: What is a dynamic RNN in TensorFlow?