Regular expression causing segfault/stack overflow

771 Views Asked by At

(or so I think)...

I'm using boost::xpressive as my regular expression engine to parse something and I get a segfault. I suspect that recursivity and my bad regular expression are to blame, because gdb shows more than 300 stack frames. So, here is my (case-sensitive) regular expression, in perl/python notation:

begin([^e]+)e((?:[^b]|b(?!egin))+)

which I expect to match

beginHEADER HEREeFOLLOWED BY SOME LONG LONG TEXT THAT GOES UNTIL NEXTbegin

with the first uppercase text (HEADER HERE) in the first group, and the second uppercase text in the second group. I always get the segfault if the text that should match group 2 is very long.

Why shouldn't this work?

1

There are 1 best solutions below

4
On BEST ANSWER

You can simplify your regular expression a lot by simply using non-greedy matching:

begin(.+?)e(.+?)begin

Try that and see if it works better.

It's likely that your original regex was causing a stack overflow due to a recursive implementation of the |-or grouping in your regex, which was potentially branching with every single character in your second group.

A simple non-greedy .+? on the other hand doesn't need to branch for every single character.