fine-tuning huggingface DistilBERT for multi-class classification on custom dataset yields weird output shape on prediction

Question

fine-tuning huggingface DistilBERT for multi-class classification on custom dataset yields weird output shape on prediction

1.2k Views Asked by roberta At 05 April 2025 at 06:33

I'm trying to fine-tune huggingface's implementation of distilbert for multi-class classification (100 classes) on a custom dataset following the tutorial at https://huggingface.co/transformers/custom_datasets.html.

I'm doing so using Tensorflow, and fine-tuning in native tensorflow, that is, I use the following part of the tutorial for dataset creation:

import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
    dict(val_encodings),
    val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
    dict(test_encodings),
    test_labels
))

And this one for fine-tuning:

from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

Everything seems to go fine with fine-tuning, but when I try to predict on the test dataset using model.predict(test_dataset) as argument (with 2000 examples), the model seems to yield one prediction per token rather than one prediction per sequence...

That is, instead of getting an output of shape (1, 2000, 100), I get an output of shape (1, 1024000, 100), where 1024000 is the number of test examples (2000) * the sequence length (512).

Any hint on what's going on here? (Sorry if this is naive, I'm very new to tensorflow).

Original Q&A

There are 1 best solutions below

**Syrius** · Answer 1

I had exactly the same problem. I do not know why it's happening, as it should by the right code by looking at the tutorial.

But for me it worked to create numpy arrays out of the train_encodings and pass them directly to the fit method instead of creating the Dataset.

x1 = np.array(list(dict(train_encodings).values()))[0]
x2 = np.array(list(dict(train_encodings).values()))[1]
model.fit([x1,x2], train_labels, epochs=20)

fine-tuning huggingface DistilBERT for multi-class classification on custom dataset yields weird output shape on prediction

There are 1 best solutions below

Related Questions in TENSORFLOW2.0

Related Questions in PREDICT

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in DISTILBERT

Trending Questions

Popular # Hahtags

Popular Questions