I am super confused on how to create a seq2seq NLP model based on a transformer with BERT as the encoder.
Please kindly advise if the process below is even correct? (If you know any materials which would help me figure this out, please kindly share...)
[encoer]
-create seq2seq transformer layer
-define BERT-attention
-pool the encoder
[decoder]
-create transformer decoder
[pool]
-concatenate encoder and decoder
I would really appreciate any advice or tips! (even just a reference)