Im trying to predict on a test sequence using Ktrain with a distilbert model, my code looks like this:
trn, val, preproc = text.texts_from_array(x_train=x_train, y_train=y_train,
x_test=x_test, y_test=y_test,
class_names=train_b.target_names,
preprocess_mode='distilbert',
maxlen=350)
model = text.text_classifier('distilbert', train_data=trn, preproc=preproc,multilabel=True)
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=64)
y_pred = learner.model.predict(val,verbose = 0)
In the other implementation of models like nbsvm, fasttext, bigru from Ktrain its quite easy as texts_from_array function returns a numpy array but with distilbert it returns a TransformerDataset, it's therefore not possible to predict on a sequence with learner.model.predict() as it generates a python index exception. Its also not possible for me to use the validate() method to generate a confusion matrix given that I have multi label classification problem. My question is how can I therefore test on a test sequence with Ktrain using distilbert, my need for this comes from the fact that my metric function is implemented based on sklearn.metric library and it needs test and validation sequence in a numpy format.
You can use a
Predictor
instance as shown in the tutorial.The
Predictor
simply uses thepreproc
object to transform the raw text into the format expected by the model and feeds this to the model.