Failed to apply unicode-escape in pandas

469 Views Asked by At

Cleaning tweet datasets by removing annoying character in bytecode (exp : \xf0\x9f\x99\x82) Here's the code without using function :

b = data_tweet['Tweet']
b.head()

for i in b:
    x = i.encode('utf=8')
    y = x.decode('unicode-escape')
    print(y) 

It worked. The character became : 🙄, 🥰, etc.

But when I implemented it using function, in order to convert it in csv file. it failed. The byte character stays the same (exp : \xf0\x9f\x99\x82) Here's the code :

def convert(text):
    for i in text:
        x = i.encode('utf=8')
        y = x.decode('unicode-escape')
        
    return text

convert(data_tweet['Tweet']) 

Does anyone know why?

1

There are 1 best solutions below

0
On

Problem is that you actually didn't assign the result to data_tweet['Tweet']. You can use apply() on Series.

def convert(text):
    x = text.encode('utf=8')
    y = x.decode('unicode-escape')
        
    return y

data_tweet['Tweet'] = data_tweet['Tweet'].apply(convert)

Or

data_tweet['Tweet'] = data_tweet['Tweet'].apply(lambda text: text.encode('utf=8').decode('unicode-escape'))