Decoder targets required for RNN inference

173 Views Asked by At

I have been trying to run some experiments using the deepfix tool (https://bitbucket.org/iiscseal/deepfix) which is a seq2seq model for correcting common programming errors. I made changes to the code so that it is compatible to TF-1.12, as the original code contains tensorflow.contrib.seq2seq functions which are not supported in version TF-1.12 (only in TF-1.0.x).

The main changes were in the seq2seq_model defined in neural_net/train.py. Below is the changed code. I'm new to the tensorflow RNN, and coded the decoder part using help from online codes.

class seq2seq_model():

PAD = 0
EOS = 1

def __init__(self, vocab_size, embedding_size, max_output_seq_len,
             cell_type='LSTM', memory_dim=300, num_layers=4, dropout=0.2,
             attention=True,
             scope=None,
             verbose=False):

    assert 0 <= dropout and dropout <= 1, '0 <= dropout <= 1, you passed dropout={}'.format(
        dropout)

    tf.set_random_seed(1189)

    self.attention = attention
    self.max_output_seq_len = max_output_seq_len

    self.memory_dim = memory_dim
    self.num_layers = num_layers
    self.dropout = dropout
    self.scope = scope

    if dropout != 0:
        self.keep_prob = tf.placeholder(tf.float32)
    else:
        self.keep_prob = None

    self.vocab_size = vocab_size
    self.embedding_size = embedding_size

    self.encoder_cell = _new_RNN_cell(
        memory_dim, num_layers, cell_type, dropout, self.keep_prob)
    self.decoder_cell = _new_RNN_cell(
        memory_dim, num_layers, cell_type, dropout, self.keep_prob)

    self._make_graph()

    if self.scope is not None:
        saver_vars = [var for var in tf.global_variables(
        ) if var.name.startswith(self.scope)]
    else:
        saver_vars = tf.global_variables()

    if verbose:
        print 'root-scope:', self.scope
        print "\n\nDiscovered %d saver variables." % len(saver_vars)
        for each in saver_vars:
            print each.name

    self.saver = tf.train.Saver(saver_vars, max_to_keep=5)

@property
def decoder_hidden_units(self):
    return self.memory_dim

def _make_graph(self):
    self._init_placeholders()

    self._init_decoder_train_connectors()

    self._init_embeddings()

    self._init_simple_encoder()

    self._init_decoder()

    self._init_optimizer()

def _init_placeholders(self):
    """ Everything is time-major """
    self.encoder_inputs = tf.placeholder(
        shape=(None, None),
        dtype=tf.int32,
        name='encoder_inputs',
    )
    self.encoder_inputs_length = tf.placeholder(
        shape=(None,),
        dtype=tf.int32,
        name='encoder_inputs_length',
    )

    self.decoder_targets = tf.placeholder(
        shape=(None, None),
        dtype=tf.int32,
        name='decoder_targets'
    )
    self.decoder_targets_length = tf.placeholder(
        shape=(None,),
        dtype=tf.int32,
        name='decoder_targets_length',
    )

def _init_decoder_train_connectors(self):

    with tf.name_scope('decoderTrainFeeds'):
        sequence_size, batch_size = tf.unstack(
            tf.shape(self.decoder_targets), name='decoder_targets_shape')

        EOS_SLICE = tf.ones([1, batch_size], dtype=tf.int32) * self.EOS
        PAD_SLICE = tf.ones([1, batch_size], dtype=tf.int32) * self.PAD

        self.decoder_train_inputs = tf.concat(
            [EOS_SLICE, self.decoder_targets], axis=0, name="decoder_train_inputs")
        self.decoder_train_length = self.decoder_targets_length + 1

        decoder_train_targets = tf.concat(
            [self.decoder_targets, PAD_SLICE], axis=0)
        decoder_train_targets_seq_len, _ = tf.unstack(
            tf.shape(decoder_train_targets))
        decoder_train_targets_eos_mask = tf.one_hot(self.decoder_train_length - 1,
                                                    decoder_train_targets_seq_len,
                                                    on_value=self.EOS, off_value=self.PAD,
                                                    dtype=tf.int32)
        decoder_train_targets_eos_mask = tf.transpose(
            decoder_train_targets_eos_mask, [1, 0])

        decoder_train_targets = tf.add(decoder_train_targets,
                                       decoder_train_targets_eos_mask, name="decoder_train_targets")

        self.decoder_train_targets = decoder_train_targets

        self.loss_weights = tf.ones([
            batch_size,
            tf.reduce_max(self.decoder_train_length)
        ], dtype=tf.float32, name="loss_weights")

def _init_embeddings(self):
    with tf.variable_scope("embedding") as scope:
        sqrt3 = math.sqrt(3)
        initializer = tf.random_uniform_initializer(-sqrt3, sqrt3)

        self.embedding_matrix = tf.get_variable(
            name="embedding_matrix",
            shape=[self.vocab_size, self.embedding_size],
            initializer=initializer,
            dtype=tf.float32)

        self.encoder_inputs_embedded = tf.nn.embedding_lookup(
            self.embedding_matrix, self.encoder_inputs,
            name="encoder_inputs_embedded")

        self.decoder_train_inputs_embedded = tf.nn.embedding_lookup(
            self.embedding_matrix, self.decoder_train_inputs,
            name="decoder_train_inputs_embedded")

def _init_simple_encoder(self):
    with tf.variable_scope("Encoder") as scope:
        (self.encoder_outputs, self.encoder_state) = (
            tf.nn.dynamic_rnn(cell=self.encoder_cell,
                              inputs=self.encoder_inputs_embedded,
                              sequence_length=self.encoder_inputs_length,
                              time_major=True,
                              dtype=tf.float32)
        )

def _init_decoder(self):
    with tf.variable_scope("decoder") as scope:
        # def output_fn(outputs):
        #     return tf.contrib.layers.fully_connected(outputs, self.vocab_size, scope=scope,
        #                                                 name = "output_fn")

        sequence_size, batch_size = tf.unstack(
            tf.shape(self.decoder_targets), name='decoder_targets_shape')

        train_helper = seq2seq.TrainingHelper(
                inputs=self.decoder_train_inputs_embedded,
                sequence_length=self.decoder_train_length,
                time_major=True,
                name="train_helper")


        pred_helper = seq2seq.SampleEmbeddingHelper(
                embedding=self.embedding_matrix,
                start_tokens=tf.ones([batch_size], dtype=tf.int32) * self.EOS,
                end_token=self.EOS)
                # name="pred_helper")

        def _decode(helper, scope, reuse=None):
            with tf.variable_scope(scope, reuse=reuse):
                attention_states = tf.transpose(
                    self.encoder_outputs, [1, 0, 2])

                attention_mechanism = seq2seq.BahdanauAttention(
                num_units=self.decoder_hidden_units, memory=attention_states,
                name="attention_mechanism")

                attention_cell = seq2seq.AttentionWrapper(
                self.decoder_cell, attention_mechanism,
                name="atttention_wrapper")

                out_cell = tf.contrib.rnn.OutputProjectionWrapper(
                    attention_cell, self.vocab_size, reuse=reuse)
                    # name="output_cell")

                decoder = seq2seq.BasicDecoder(
                    cell=out_cell, helper=helper,
                    initial_state=out_cell.zero_state(
                        dtype=tf.float32, batch_size=batch_size))
                        # name="decoder")

                outputs = seq2seq.dynamic_decode(
                    decoder=decoder, output_time_major=True,
                    impute_finished=True)
                    # name="outputs")

                return outputs



        (self.decoder_logits_train, self.decoder_state_train, _) = _decode(train_helper, "decoder")
        (self.decoder_logits_inference, self.decoder_state_inference, _) = _decode(pred_helper, "decoder", reuse=True)

        self.decoder_logits_train = self.decoder_logits_train.rnn_output
        self.decoder_logits_inference = self.decoder_logits_inference.rnn_output
        # self.decoder_logits_train = output_fn(self.decoder_outputs_train)

        self.decoder_prediction_train = tf.argmax(
            self.decoder_logits_train, axis=-1, name='decoder_prediction_train')

        scope.reuse_variables()

        self.decoder_prediction_inference = tf.argmax(self.decoder_logits_inference, axis=-1,
                                                      name='decoder_prediction_inference')


def _init_optimizer(self):
    logits = tf.transpose(self.decoder_logits_train, [1, 0, 2])
    targets = tf.transpose(self.decoder_train_targets, [1, 0])
    self.loss = seq2seq.sequence_loss(logits=logits, targets=targets,
                                      weights=self.loss_weights)

    self.optimizer = tf.train.AdamOptimizer()
    gvs = self.optimizer.compute_gradients(self.loss)

    def ClipIfNotNone(grad):
        if grad is None:
            return grad
        return tf.clip_by_value(grad, -1., 1)

    # capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
    capped_gvs = [(ClipIfNotNone(grad), var) for grad, var in gvs]

    self.train_op = self.optimizer.apply_gradients(capped_gvs)

def make_feed_dict(self, x, x_len, y, y_len):
    feed_dict = {
        self.encoder_inputs: x,
        self.encoder_inputs_length: x_len,

        self.decoder_targets: y,
        self.decoder_targets_length: y_len,
    }

    if self.dropout != 0:
        feed_dict.update({self.keep_prob: 1.0 - self.dropout})

    return feed_dict

def load_parameters(self, sess, filename):
    self.saver.restore(sess, filename)

def save_parameters(self, sess, filename, global_step=None):
    self.saver.save(sess, filename, global_step=global_step)

def train_step(self, session, x, x_len, y, y_len):
    feed_dict = self.make_feed_dict(x, x_len, y, y_len)
    _, loss = session.run([self.train_op, self.loss], feed_dict)
    return loss

def validate_step(self, session, x, x_len, y, y_len):
    feed_dict = self.make_feed_dict(x, x_len, y, y_len)
    loss, decoder_prediction, decoder_train_targets = session.run([self.loss,
                                                                   self.decoder_prediction_inference,
                                                                   self.decoder_train_targets], feed_dict)
    return loss, np.array(decoder_prediction).T, np.array(decoder_train_targets).T

def sample(self, session, X, X_len):
    feed_dict = {self.encoder_inputs: X,
                 self.encoder_inputs_length: X_len}

    if self.dropout != 0:
        feed_dict.update({self.keep_prob: 1.0})

    decoder_prediction = session.run(
        self.decoder_prediction_inference, feed_dict)
    return np.array(decoder_prediction).T

I am having some problems with this code:

  1. Main problem - The seq2seq.train_step() and seq2seq.validate_step() functions are working, but when I use seq2seq.sample() for actually making inferences, I get an error that asks me to feed a value for decoder_targets. This is an unexpected behaviour as the SampleEmbeddingHelper function is used for inference which does not require decoder_targets. The error:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'ids/decoder_targets' with dtype int32 and shape [?,?] [[node ids/decoder_targets (defined at .../code/neural_net/train.py:241) = Placeholderdtype=DT_INT32, shape=[?,?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

  1. When I try to use the GreedyEmbeddingHelper instead of SampleEmbeddingHelper, and then run decoder_logits_inference op, the machine hangs and runs out of memory after some time. Although SampleEmbeddingHelper works fine.
1

There are 1 best solutions below

1
On
  1. Well, SampleEmbeddingHelper does need decoder targets, since it mixes part of GreedyEmbeddingHelper(infer mode) and tf.contrib.seq2seq.TrainingHelper(teacher forcing). I think you just need to use GreedyEmbeddingHelper.

  2. Since in the beginning, the parameters are totally random (if not pre-trained). Maybe you have seen that the results of the first few loops of seq2seq model are totally messed up. So if you use GreedyEmbeddingHelper, which outputs a result based on the previous one, and of course no one teaches it "where to stop", so it usually goes infinitely until your memory runs out. To solve this, you need to set an upper limit for the length of sentence in tf.contrib.seq2seq.dynamic_decode. The argument is maximum_iterations. as shown in tf.contrib.seq2seq.dynamic_decode