So I wanted to replace all the happy emoticons with "HAPPY" and vice versa "SAD" for sad emoticons for a text file. But the code isnt working properly. Though it detects smileys (as of now :-) ), but in the below example its not replacing the emoticon with text, its simply appending the text and that too its appending it two times for reasons I dont seem to understand.
dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD", ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"}
dict_happy={":-)":"HAPPY",":)":"HAPPY", ":o)":"HAPPY",":-}":"HAPPY",";-}":"HAPPY",":->":"HAPPY",";-)":"HAPPY"}
#THE INPUT TEXT#
a="guys beautifully done :-)"
for i in a.split():
for j in dict_happy.keys():
if set(j).issubset(set(i)):
print "HAPPY"
continue
for k in dict_sad.keys():
if set(k).issubset(set(i)):
print "SAD"
continue
if str(i)==i.decode('utf-8','replace'):
print i
THE INPUT TEXT
a="guys beautifully done :-)"
OUTPUT ("HAPPY" is coming two times, also the emoticon isnt getting away)
guys
-
beautifully
done
HAPPY
HAPPY
:-)
EXPECTED OUTPUT
guys
beautifully
done
HAPPY
You are turning each word and each emoticon to a set; this means you are looking for overlap of individual characters. You probably wanted uses exact matches at most:
You can iterate over dictionaries directly, no need to call
.keys()
there. You don't actually appear to be using the dictionary values; you could just do:and then perhaps use sets instead of dictionaries. This then can be reduced to:
using the dictionary view on the keys as a set. Still, it would still be better to use sets then:
If you wanted to remove the emoticon from the text, you'll have to filter the words:
or better still, combine the two dictionaries and use
dict.get()
:Here I pass in the current word both as the look-up key and the default; if the current word is not an emoticon, the word itself is printed, otherwise the word
SAD
orHAPPY
is printed instead.