Issues with BertLMDataBunch.from_raw_corpus(), 'charmap' codec can't encode character '\u0627' in position 0: character maps to
When creating a BertLMDataBunch object, I got issue that 'charmap' codec can't encode character '\u0627' in position 0. When I tried to encode my texts using utf-8, I got this error : 'charmap' codec can't encode characters in position 20-25: character maps to I also thought about avoiding punctuation or special characters like 'éèêçàôûù' but I got the same error.
df_train is my labeled dataset, and Description is the column with french texts.
DATA_PATH = Path('./data/')
all_texts = df_train['Description'].to_list()
all_texts = [ (x.encode('utf-8', errors='ignore')).decode('utf-8', errors='ignore') for x in all_texts]
The texts also contain numbers
The object I created generates a text file lm_trained that contains texts like this :
Bonjour Le 21 Avril 2021 j ai envoy� une r�clamation
If anyone can help me to fix this. Thank you !