I'm trying to implement an RNN autoencode and I was wondering if attention would improve my results.
My end goal is to build a document similarity search engine, and I'm looking for ways to encode the documents.
As I'm not interested in training the decoder but rather the encoder, does attention add value to the process?
As the length of the sequences increase above the capacity limit of your RNN, the performance will keep dropping.
The attention mechanism allows the RNN to progressively focus on just the optimal subsequences, so in an optimal scenario the performance won't be constrained by the maximal length of the sequence. The effectivity of the attention model for NLP applications like language translation is well confirmed.
In this context there is a tradeoff that has to be regarded: The whole attention model is trained end-to-end with gradient descent. The attention weights form an matrix of shape
(len(input_seq), len(output_seq))
and training them has a quadratic runtime. Therefore, attention will be most useful if:In any case there is active research to reduce that runtime. For paper references and more information about this I recommend you to check the 3rd week's videos of Andrew Ng's Coursera in deep sequence modelling (access is free). The course also covers a Keras implementation of an attention model with some nice plots.
Hope this helps! cheers,
Andres