what is the meaning of this error using ktrain with preprocessing train function using python with deep learning

306 Views Asked by At

i am trying to use deep learning model in order to create sentiment analysis project. For this i am using the ktrain package but the problem is in the preprocess_train()

The above function takes as parameters def preprocess_train(texts, y=None, mode='train', verbose=1)

Args:
    texts (list of strings): text of documents
    y: labels
    mode (str):  If 'train' and prepare_for_learner=False,
                 a tf.Dataset will be returned with repeat enabled
                 for training with fit_generator
    verbose(bool): verbosity
Returns:
  TransformerDataset if self.use_with_learner = True else tf.Dataset

based on the ktrain user guide i did the follow:

code:

import ktrain
from ktrain import text
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
from sklearn import metrics

MODEL_NAME = 'aubmindlab/bert-base-arabertv01'
t = text.Transformer(MODEL_NAME, maxlen=128)
trn = t.preprocess_train(X_train_smote.Tweet.values, y_train_smote)
val = t.preprocess_test(X_test.Tweet.values, y_test)
model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=32)

where:

X_train_smote.Tweet.values --> array([1830, 471, 1100, ..., 1308, 930, 868])

type(X_train_smote.Tweet.values) --> numpy ndarray

y_train_smote --> array(['NEGATIVE', 'NEGATIVE', 'POSITIVE', ..., 'POSITIVE', 'POSITIVE', 'POSITIVE'], dtype=object) type(y_train_smote) --> numpy ndarray

The system crash and display the below error :

preprocessing train...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-81-78dde2289830> in <module>()
      6 MODEL_NAME = 'aubmindlab/bert-base-arabertv01'# using the Arabert
      7 t = text.Transformer(MODEL_NAME, maxlen=128)
----> 8 trn = t.preprocess_train(X_train_smote.Tweet.values, y_train_smote)
      9 val = t.preprocess_test(X_test.Tweet.values, y_test)
     10 model = t.get_classifier()

2 frames
/usr/local/lib/python3.7/dist-packages/ktrain/text/preprocessor.py in detect_text_format(texts)
    231         is_pair = _is_sentence_pair(peek)
    232         if not is_pair and not isinstance(peek, str):
--> 233             raise ValueError(err_msg)
    234     return is_array, is_pair
    235 

ValueError: invalid text format: texts should be list of strings or list of sentence pairs in form of tuples (str, str)
0

There are 0 best solutions below