I'm using Xpressive extensively in my current embedded C++ project.
As I know, Xpressive is a great user of the stack. But are there Xpressive regex approaches that are more stack efficient?
E.g. a regex to match a string representing a 32-bit integer may need to test if digit number six is less than or equal to 6.
Xpressive (and other regex engines too, I know) allows numerous approaches, e.g.:
range('0','6')
or
('0'|'1'|'2'|'3'|'4'|'5'|'6')
or
set['0'|'1'|'2'|'3'|'4'|'5'|'6']
and then the regex may allow for 3 following digits, e.g.:
repeat<3>(_d) >> _b
or
_d >> _d >> _d >> _b
But, given the choice, and not caring too much about source code layout, which approach is optimal for:
a) stack;
b) speed;
c) ?
Sequencing and repetition use up stack because they require backtracking. Let's look at your examples.
This is implemented as a quick check to see if a character falls within a character range. It doesn't use sequencing or repetition, so it doesn't gobble stack space.
In general, don't put in an alternate what can be put in a set or a range. They are far more efficient for this kind of thing. I'll note that obviously this isn't a valid xpressive regex. You would write it as:
I'm sure you know that. I'll make a similar edit for this one:
Sets are fast. They are implemented as a look-up table for the ascii range and (for Unicode) a sparse vector for character codes above 256. Because of the sparseness of some sets, this scheme can take up more heap space than a simple range. But that doesn't effect the amount of stack space used when matching one. I'll note you can also write it as:
Now for repetition:
As I said before, repetition is usually implemented with recursion to get the backtracking right. But in this case, xpressive recognizes that
_d
can only ever eat one character, and that it's being repeated exactly 3 times. So in this limited case, it's implemented with a loop to conserve stack space. This is in contrast to the next case:Xpressive isn't smart enough to treat
_d >> _d >> _d
the same way asrepeat<3>(_d)
. As a result, this will consume a separate stack frame for each_d
.Short version: prefer specialized constructs where you can over their more general equivalents. That gives xpressive more context for optimization.
In addition, get to know independent sub-expressions. If you can put something in a
keep()
, go for it. Why?keep()
matches a sub-expression, then reclaims the stack space used by that sub-match. That makes it impossible to try different things by backtracking into that sub-expression, but sometimes you don't need that.Hope that helps.