python multiline regex capture

221 Views Asked by At

I have the following string:

hello
abcd
pqrs
123
123
123

My objective is to capture everything starting hello and till the first occurrence of 123. So the expected output is as:

hello
abcd
pqrs
123

I used the following:

output=re.findall('hello.*123?',input_string,re.DOTALL)

But the output is as:

['hello\nabcd\npqrs\n123\n123\n123']

Is there a way to make this lookup non-greedy using ? for 123? Or is there any other way to achieve the expected output?

1

There are 1 best solutions below

2
On BEST ANSWER

Try using lookhead for this. You are looking for a group of characters followed by \n123\n:

import re

input_string = """hello
abcd
pqrs
123
123
123"""

output_string = re.search('[\w\n]+(?=\n123\n)', input_string).group(0)

print(output_string)

#hello
#abcd
#pqrs
#123

I hope this proves useful.