Using Transformer's decoder to extract sentences

357 Views Asked by At

My question relates directly to the text summarization task. I know that there are a bunch of implementations with RNN networks (especially LSTMs) that use sentence-level attention to extract salient sentences of the source, using an attentive LSTM decoder. I have been digging into it to see if this is possible with the Transformer's networks, specifically the Transformer's decoder part, but do not really have an idea how to get this incorporated.

Look, for example, in terms of LSTM decoder, the LSTM encoder can produce contextualized encodings for the sentences that are in the source, then the last hidden state is passed to the LSTM decoder, and at each decoding timestep, the decoder attends to the source sentences (encoded by the encoder) to get an attention score over those. Finally, these scores and the sentence hidden states are combined to form the context vector, which is further processed with the decoder hidden state to predict the right sentence to pick up.

Assume that I have obtained the sentence encodings using Transformer's encoder, I'm just wondering how I can relate the scenario happening in LSTM network's decoder part to the Transformer's network decoder side.

Also, a question, how are these networks (both LSTM and Transformer's) that use sentence-level attention instead of word-level attention trained?


Update: the behaviour that I intended to achieve is as follows: I want the Transformer's decoder gets in sentences (instead of tokens which is then regarded as abstractive summarization), compute attention on the source sentences considering the partial summary that has been selected in prior timesteps, and then give a probability distribution over the source sentences denoting how much it is probable that a sentence is being copied into the target. So to make it explicit, I'm looking for an extractive summarizer with decoder.

0

There are 0 best solutions below