unbalanced parenthesis regex

974 Views Asked by At
!pip install emot
from emot.emo_unicode import EMOTICONS_EMO
def convert_emoticons(text):
    for emot in EMOTICONS_EMO:
        text = re.sub(u'\('+emot+'\)', "_".join(EMOTICONS_EMO[emot].replace(",","").split()), text)
        return text

text = "Hello :-) :-)"
convert_emoticons(text)

I'm trying to run the above code in google collab, but it gives the following error: unbalanced parenthesis at position 4

My undesrtanding from the re module documentation tells that '\(any_expression'\)' is correct way to use, but I still get the error. So, I'have tried replacing '\(' + emot + '\) with:

  1. '(' + emot + ')', it gives the same error
  2. '[' + emot + ']', it gives the following output: Hello Happy_face_or_smiley-Happy_face_or_smiley Happy_face_or_smiley-Happy_face_or_smiley

The correct output should be Hello Happy_face_smiley Happy_face_smiley for text = "Hello :-) :-)"

Can someone help me fix the problem?

1

There are 1 best solutions below

0
David542 On BEST ANSWER

This is pretty tricky using regex, as you'd first need to escape the metachars in the regex that are contained in the emoji, such as :) and :(, which is why you get the unbalanced parens. So, you'd need to do something like this first:

>>> print(re.sub(r'([()...])', r'%s\1' % '\\\\', ':)'))
:\)

But I'd suggest just doing a straight replacement since you already have a mapping that you're iterating through it. So we'd have:

from emot.emo_unicode import EMOTICONS_EMO
def convert_emoticons(text):
    for emot in EMOTICONS_EMO:
        text = text.replace(emot, EMOTICONS_EMO[emot].replace(" ","_"))
    return text


text = "Hello :-) :-)"
convert_emoticons(text)
# 'Hello Happy_face_smiley Happy_face_smiley'