Simpletransformers always generating empty strings

518 Views Asked by At

so i was trying to train a chatbot using transformers for my ai assistant , i thought simpletransformer package in python would help me speed up alot of my tasks . I soon gathered a good dataset over kaggle (https://www.kaggle.com/datasets/arnavsharmaas/chatbot-dataset-topical-chat) to train my chatbot , i loaded up the data did some preprocessing and transformed it into one column input_text another target_text as mentioned in the docs. Then i trained my model with encoder type as roberta and decoder type as bert as thats what selected by default and it cannot be changed i saw it in the docs . I trained it on the first 1k samples and see if the code is working at first try i gave it one line from the dataset and it just spammed the word my the result was #mymymymymy i restarted my runtime and trained again this time it always generated an empty string , i was expecting proper results . Here are the code snippets:-

Loading and preprocessing data :-

import pandas as pd
df=pd.read_csv("../input/chatbot-dataset-topical-chat/topical_chat.csv")
#converting to required format
new_df={"input_text":[],'target_text':[]}
for i in range(0,df.shape[0]):
    if i%2==0:
        new_df['input_text'].append(df['message'][i])
    else:
        new_df['target_text'].append(df['message'][i])
new_df=pd.DataFrame(new_df)
new_df.head()

works uptil here and the code for training the transformer is

!pip install simpletransformers
from simpletransformers.seq2seq import Seq2SeqModel, Seq2SeqArgs


model_args = Seq2SeqArgs()
model_args.num_train_epochs = 3
model_args.overwrite_output_dir = True
model = Seq2SeqModel(
    "roberta",
    "roberta-base",
    "bert-base-cased",
    args=model_args,
)
model.train_model(new_df.head(1000))

here are the results enter image description here

finally i asked it to predict a sample from the dataframe it once spammed a word like i said after restart it produces empty string Can anyone help me please?

1

There are 1 best solutions below

1
On

If it is a Seq2Seq Mapping problem, I find that BART does a better job than RoBERTa

Also, preferably have more samples in your input training data as well. Here is a script with BART to get you started:

import logging
import pandas as pd
from simpletransformers.seq2seq import Seq2SeqModel, Seq2SeqArgs

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

#set up data for training
df = pd.read_excel('sample_input.xlsx')

train_size = int(len(df)*0.8)

train_df, eval_df = df[:train_size], df[train_size:]
train_df, eval_df = train_df[['input_text', 'target_text']], eval_df[['input_text', 'target_text']]

# Configure the model
model_args = Seq2SeqArgs()
model_args.num_train_epochs = 10
model_args.train_batch_size = 16
model_args.eval_batch_size = 8
model_args.evaluate_generated_text = True
model_args.evaluate_during_training = True
model_args.evaluate_during_training_verbose = True
model_args.overwrite_output_dir = True

model = Seq2SeqModel(
     encoder_decoder_type="mbart",
     encoder_decoder_name="facebook/mbart-large-cc25",
     use_cuda=True,
     args=model_args)

# Train the model
model.train_model(train_df, eval_data=eval_df)

# Evaluate the model
result = model.eval_model(eval_df)

# Use the model for prediction
print(model.predict(['Hi']))