How do i decode this string? \xc3\x99\xc3\xa9\xc2\x87-B[x\xc2

12.3k Views Asked by At

This is what I need to decode

\xc3\x99\xc3\x99\xc3\xa9\xc2\x87-B[x\xc2\x99\xc2\xbe\xc3\xa6\x14Ez\xc2\xab

it is generated by String.fromCharCode(arrayPw[i]); but i don't understand how to decode it :(

Please help

2

There are 2 best solutions below

4
gotnull On

Python:

data = "\xc3\x99\xc3\x99\xc3\xa9\xc2\x87-B[x\xc2\x99\xc2\xbe\xc3\xa6\x14Ez\xc2\xab"
udata = data.decode("utf-8")
asciidata = udata.encode("ascii","ignore")

JavaScript:

function decode_utf8(s) {
  return decodeURIComponent(escape(s));
}

Otherwise do more research about decoding UTF-8.

https://gist.github.com/chrisveness/bcb00eb717e6382c5608

There's also an online UTF-8 decoder/encoder:

https://mothereff.in/utf-8

HINT: ÙÙé-B[x¾æEz«

0
Nicoolasens On

duplicate of this : https://stackoverflow.com/a/70815136/5902698

You load a dataset and you have some strange characters. Exemple :

'戴森美å�‘é€\xa0型器完整版套装Dyson Airwrap HS01(铜金色礼盒版)'

In my case, I know that the strange characters are chineses. So I can figure that the one who send me the data have encode it in utf-8 but should do it in 'ISO-8859-1'.

So first step, I had encoded the string, then I decode with utf-8. so my lines are :

_encoding = 'ISO-8859-1'
_my_str.encode(_encoding, 'ignore').decode("utf-8", 'ignore')

Then my output is :

"'森Dyson Airwrap HS01礼'"

This works for me, but I guess that I do not really well understood under the hood. So feel free to tell me if you have further information.

Bonus. I'll try to detect when the str is in the first strange format because some of my entries are in chinese but others are in english

EDIT : The Bonus is useless. I Just use lamba on ma column to encode and decode without care about format. So I changed the encoding after loading the dataframe

_encoding = 'ISO-8859-1'
_decoding = "utf-8"
df[col] = df[col].apply(lambda x : x.encode(_encoding, 'ignore').decode(_decoding , 'ignore'))