Remove timestamp in the bracket from text Python

334 Views Asked by At

I'd like to remove all the timestamps in the parentheses in the below sample text data.

Input:

Agent: Can I help you? ( 3s ) Customer: Thank you( 40s ) Customer: I have a question about X. ( 8m 1s ) Agent: I can help here. Log in this website (remember to use your new password) ( 11m 31s )

Expected Output:

Agent: Can I help you? Customer: Thank you Customer: I have a question about X. Agent: I can help here. Log in this website (remember to use your new password)

I tried re.sub(r'\(.*?\)', '', data) but it did not work as it removes everything in the parentheses. I want to keep the content in the parentheses if it is not a timestamp, for instance, I'd like to keep "(remember to use your new password)" in the output.

Still new to regex so hope I can get some guidance here. Thank you!

3

There are 3 best solutions below

0
nishu b On BEST ANSWER
\(\s(\d{1,2}[smh]\s)+\)

FYI: .* matches everything except line terminator.

4
buran On

Not regex and maybe not that efficient, but string methods will do:

spam = "Agent: Can I help you? ( 3s ) Customer: Thank you( 40s ) Customer: I have a question about X. ( 8m 1s ) Agent: I can help here. Log in this website (remember to use your new password) ( 11m 31s )"

def cleanup(text):
    for word in ('Agent', 'Customer'):
        text = text.replace(word, f'\n{word}').strip()
    clean_text = [line[:line.rindex('(')] for line in text.splitlines()]

    # or in slow-motion
    # clean_text = []
    # for line in text.splitlines():
    #     idx = line.rindex('(')
    #     line = line[:idx]
    #     clean_text.append(line)

    return ' '.join(clean_text)

print(cleanup(spam))

output

Agent: Can I help you?  Customer: Thank you Customer: I have a question about X.  Agent: I can help here. Log in this website (remember to use your new password)

EDIT: As suggested by @DRPK it can be optimized by making it one liner which will make difference in big corpus

clean_text = ' '.join([line[:line.rindex('(')] for line in text.replace("Agent", '\nAgent').replace("Customer", '\nCustomer').strip().splitlines()])
1
Nguyễn Văn Huy On
\( [^\)]++\)

You can use this regex to replace with "" in your code. I did generate it from http://www.amazingregex.xyz/. You can generate by yourself with text examples