i am trying to use deep learning model in order to create sentiment analysis project.
For this i am using the ktrain package but the problem is in the preprocess_train()
The above function takes as parameters def preprocess_train(texts, y=None, mode='train', verbose=1)
Args:
texts (list of strings): text of documents
y: labels
mode (str): If 'train' and prepare_for_learner=False,
a tf.Dataset will be returned with repeat enabled
for training with fit_generator
verbose(bool): verbosity
Returns:
TransformerDataset if self.use_with_learner = True else tf.Dataset
based on the ktrain user guide i did the follow:
code:
import ktrain
from ktrain import text
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
from sklearn import metrics
MODEL_NAME = 'aubmindlab/bert-base-arabertv01'
t = text.Transformer(MODEL_NAME, maxlen=128)
trn = t.preprocess_train(X_train_smote.Tweet.values, y_train_smote)
val = t.preprocess_test(X_test.Tweet.values, y_test)
model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=32)
where:
X_train_smote.Tweet.values
--> array([1830, 471, 1100, ..., 1308, 930, 868])
type(X_train_smote.Tweet.values)
--> numpy ndarray
y_train_smote
--> array(['NEGATIVE', 'NEGATIVE', 'POSITIVE', ..., 'POSITIVE', 'POSITIVE',
'POSITIVE'], dtype=object)
type(y_train_smote)
--> numpy ndarray
The system crash and display the below error :
preprocessing train...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-81-78dde2289830> in <module>()
6 MODEL_NAME = 'aubmindlab/bert-base-arabertv01'# using the Arabert
7 t = text.Transformer(MODEL_NAME, maxlen=128)
----> 8 trn = t.preprocess_train(X_train_smote.Tweet.values, y_train_smote)
9 val = t.preprocess_test(X_test.Tweet.values, y_test)
10 model = t.get_classifier()
2 frames
/usr/local/lib/python3.7/dist-packages/ktrain/text/preprocessor.py in detect_text_format(texts)
231 is_pair = _is_sentence_pair(peek)
232 if not is_pair and not isinstance(peek, str):
--> 233 raise ValueError(err_msg)
234 return is_array, is_pair
235
ValueError: invalid text format: texts should be list of strings or list of sentence pairs in form of tuples (str, str)