This is what I need to decode
\xc3\x99\xc3\x99\xc3\xa9\xc2\x87-B[x\xc2\x99\xc2\xbe\xc3\xa6\x14Ez\xc2\xab
it is generated by String.fromCharCode(arrayPw[i]);
but i don't understand how to decode it :(
Please help
This is what I need to decode
\xc3\x99\xc3\x99\xc3\xa9\xc2\x87-B[x\xc2\x99\xc2\xbe\xc3\xa6\x14Ez\xc2\xab
it is generated by String.fromCharCode(arrayPw[i]);
but i don't understand how to decode it :(
Please help
On
duplicate of this : https://stackoverflow.com/a/70815136/5902698
You load a dataset and you have some strange characters. Exemple :
'戴森美å�‘é€\xa0型器完整版套装Dyson Airwrap HS01(铜金色礼盒版)'
In my case, I know that the strange characters are chineses. So I can figure that the one who send me the data have encode it in utf-8 but should do it in 'ISO-8859-1'.
So first step, I had encoded the string, then I decode with utf-8. so my lines are :
_encoding = 'ISO-8859-1'
_my_str.encode(_encoding, 'ignore').decode("utf-8", 'ignore')
Then my output is :
"'森Dyson Airwrap HS01礼'"
This works for me, but I guess that I do not really well understood under the hood. So feel free to tell me if you have further information.
Bonus. I'll try to detect when the str is in the first strange format because some of my entries are in chinese but others are in english
EDIT : The Bonus is useless. I Just use lamba on ma column to encode and decode without care about format. So I changed the encoding after loading the dataframe
_encoding = 'ISO-8859-1'
_decoding = "utf-8"
df[col] = df[col].apply(lambda x : x.encode(_encoding, 'ignore').decode(_decoding , 'ignore'))
Python:
JavaScript:
Otherwise do more research about decoding UTF-8.
https://gist.github.com/chrisveness/bcb00eb717e6382c5608
There's also an online UTF-8 decoder/encoder:
https://mothereff.in/utf-8
HINT:
ÙÙé-B[x¾æEz«