Cleaning tweet datasets by removing annoying character in bytecode (exp : \xf0\x9f\x99\x82) Here's the code without using function :
b = data_tweet['Tweet']
b.head()
for i in b:
x = i.encode('utf=8')
y = x.decode('unicode-escape')
print(y)
It worked. The character became : 🙄, 🥰, etc.
But when I implemented it using function, in order to convert it in csv file. it failed. The byte character stays the same (exp : \xf0\x9f\x99\x82) Here's the code :
def convert(text):
for i in text:
x = i.encode('utf=8')
y = x.decode('unicode-escape')
return text
convert(data_tweet['Tweet'])
Does anyone know why?
Problem is that you actually didn't assign the result to
data_tweet['Tweet']
. You can useapply()
on Series.Or