distillbert ktrain 'too many values to unpack'

308 Views Asked by At

I am trying to run DistilBert with ktrain in Colab but I am getting "error too many values to unpack". I am trying to perform toxic comment classification, I uploaded 'train.csv' from CivilComments, I am able to run BERT but not DistilBert

#prerequisites:
!pip install ktrain
import ktrain
from ktrain import text as txt
DATA_PATH = '/content/train.csv'
NUM_WORDS = 50000 
MAXLEN = 150 
label_columns = ["toxic", "severe_toxic", "obscene", 
                 "threat", "insult", "identity_hate"]

it works fine if I just preprocess with 'bert' but then I cannot use distilbert model. When preprocessing with distilbert I get the error:

 (x_test, y_test), preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN,  preprocess_mode='distilbert')

'too many values to unpack, expected 2', if I substitute distilbert with bert it works fine (code below), but then I am forced to use bert as model, preprocessing with bert works fine:

(x_train, y_train), (x_test, y_test), preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN,  preprocess_mode='bert')

no error on this one but then I cannot use distilbert, see below:

example: model = txt.text_classifier('distilbert', train_data=(x_train, y_train), preproc=preproc) error message: if 'bert' is selected model, then preprocess_mode='bert' should be used and vice versa

I want to use (x_test, y_test), preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN, preprocess_mode='distilbert') with distillbert model, how to avoid error 'too many values to unpack'

Links on which the code is based: Arun Maiya (2019). ktrain: A Lightweight Wrapper for Keras to Help Train Neural Networks. https://towardsdatascience.com/ktrain-a-lightweight-wrapper-for-keras-to-help-train-neural-networks-82851ba889c.

1

There are 1 best solutions below

0
On

As shown in this example notebook, the texts_from_* functions return TransformerDataset objects (not NumpyArrays) when specifying preprocess_mode='distilbert' as the model. So, you'll need to do something like this:

trn, val, preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN,  preprocess_mode='distilbert')