Seq2Seq Models for Chatbots

634 Views Asked by Subham Mukherjee At 05 June 2025 at 09:27

I am building a chat-bot with a sequence to sequence encoder decoder model as in NMT. From the data given I can understand that when training they feed the decoder outputs into the decoder inputs along with the encoder cell states. I cannot figure out that when i am actually deploying a chatbot in real time, how what should I input into the decoder since that time is the output that i have to predict. Can someone help me out with this please?

Original Q&A

There are 1 best solutions below

Maxim On 02 February 2018 at 19:43

The exact answer depends on which building blocks you take from Neural Machine Translation model (NMT) and which ones you would replace with your own. I assume the graph structure exactly as in NMT.

If so, at inference time, you can feed just a vector of zeros to the decoder.

Internal details: NMT uses the entity called Helper to determine the next input in the decoder (see tf.contrib.seq2seq.Helper documentation).

In particular, tf.contrib.seq2seq.BasicDecoder relies solely on helper when it performs a step: the next_inputs that the are fed in to the subsequent cell is exactly the return value of Helper.next_inputs().

There are different implementations of Helper interface, e.g.,

tf.contrib.seq2seq.TrainingHelper is returning the next decoder input (which is usually ground truth). This helper is used in training as indicated in the tutorial.
tf.contrib.seq2seq.GreedyEmbeddingHelper discards the inputs, and returns the argmax sampled token from the previous output. NMT uses this helper in inference when sampling_temperature hyper-parameter is 0.
tf.contrib.seq2seq.SampleEmbeddingHelper does the same, but samples the token according to categorical (a.k.a. generalized Bernoulli) distribution. NMT uses this helper in inference when sampling_temperature > 0.
...

The code is in BaseModel._build_decoder method. Note that both GreedyEmbeddingHelper and SampleEmbeddingHelper don't care what the decoder input is. So in fact you can feed anything, but the zero tensor is the standard choice.

Seq2Seq Models for Chatbots

There are 1 best solutions below

Related Questions in TENSORFLOW

Related Questions in MACHINE-LEARNING

Related Questions in CHATBOT

Related Questions in MACHINE-TRANSLATION

Related Questions in SEQUENCE-TO-SEQUENCE

Trending Questions

Popular # Hahtags

Popular Questions