While trying to generate text using GPT-2 the custom loss function accesses PAD_TOKEN_ID

180 Views Asked by S_2010 At 07 June 2025 at 05:03

While training the custom loss function tries to access the PAD_TOKEN_ID resulting in the below error.50257 is the PAD_TOKEN_ID and the vocab size of GPT-2

InvalidArgumentError: {{function_node __wrapped__SparseSoftmaxCrossEntropyWithLogits_device_/job:localhost/replica:0/task:0/device:CPU:0}} Received a label value of 50257 which is outside the valid range of [0, 50257).  Label values: 389 1976 1437 264 649 24867 1762 503 5633 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 50257 5025...

In order to remove this I tried masking the Labels and the logits.The labels before masking have a shape of (1260,) and post masking it is (132,). The logits before masking have a shape of (1260, 50257) and post masking it is (63323820,) which is (1260 * 63323820,). The code I am using to mask the logits is as follows:-

shift_logits = logits[..., :-1, :]
shift_logits = tf.reshape(shift_logits, [-1, shift_logits.shape[-1]])
mask_logits = tf.math.logical_not(tf.math.equal(shift_logits, pad_token_id))
mask_logits = tf.cast(mask_logits, dtype=tf.float32)
shift_logits_masked = tf.boolean_mask(shift_logits,mask_logits)

So there is a primary problem where the label value of 50257 is being accessed and while trying to remove that by masking both logits and labels they fail due to different shapes. This is probably a dumb question however since I am running out of ideas hence it would be really helpful if someone can have a look.

I tried masking both the labels and logits but as mentioned above the size of the labels are (1260,) and logits (1260,50257) hence whenever I am trying to apply tf.boolean_mask then it fails with shape mismatch error. I am expecting to calculate the loss as mentioned below :

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
loss = loss_fn(shift_labels_masked, shift_logits_masked)

since this is text generation in my training loop am passing the labels as input_ids as shown below:

for epoch in range(num_epochs):
  for batch in train_ds:
    input_ids = batch["input_ids"]
    with tf.GradientTape() as tape:
      outputs = model(input_ids)
      loss = loss_fn(outputs,labels=batch["input_ids"],pad_token_id=tokenizer.pad_token_id)
      loss = tf.reduce_mean(loss)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    #if optimizer.iterations % 100 == 0:
    print("Epoch {} Batch {} Loss {:.4f}".format(epoch + 1, optimizer.iterations.numpy(), loss.numpy()))

Original Q&A

While trying to generate text using GPT-2 the custom loss function accesses PAD_TOKEN_ID

There are 0 best solutions below

Related Questions in TENSORFLOW

Related Questions in LOSS-FUNCTION

Related Questions in GPT-2

Related Questions in TEXT-GENERATION

Trending Questions

Popular # Hahtags

Popular Questions