How can I remove escaping '\' in my string to decode encoded letters?

92 Views Asked by At

I'm working on a project with a dataset coming from Board Game Geek.

The issue I have concerns the name of the games I'm studying. I think the encoding worked bad so I have encoded letters in the csv file I received. For example : Orl\u00e9ans instead of Orléans

When I import the csv in Python, they remain like that and I want to correct these letters.

I manage to find the correct function I guess with this :

>>> unicodedata.normalize("NFD", 'Orl\u00e9ans')
'Orléans'

The problem is that I can't run this function through a for loop.
Indeed, the string displayed is 'Orl\u00e9ans' but in fact, it's 'Orl\\u00e9ans' so the function cannot do the job.

Is there any way to correct this ? I have 20000 entries in the dataset, I can't correct them all 1 by 1.
Thank you

EDIT I got the answer in this article : Process escape sequences in a string in Python

>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3 
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs

Thanks a lot

1

There are 1 best solutions below

0
On

I would try to use latin1 encoding as follows:

import codecs with codecs.open(r'$(path to your csv file)', encoding='latin1') as f: