I have been trying to resolve this for the past 2 days...
Please help me in understanding why this is happening. My intention is to just select the <HDR> that has a <DTL1 val="92">.....</HDR>
This is my regular expression
(?<=<HDR>).*?<DTL1\sval="3".*?</HDR>
And the input string is:
<HDR>abc<DTL1 val="1"><DTL2 val="2"></HDR><HDR><DTL1 val="92"><DTL2 val="55"></HDR><HDR><DTL1 val="3"><DTL2 val="4"></HDR>
But this regular expression selects
abc<DTL1 val="1"><DTL2 val="2"></HDR><HDR><DTL1 val="92"><DTL2 val="55"></HDR>
Can anyone please help me?
A regex engine will give you always the leftmost match in a string (even if you use a non-greedy quantifier). This is exactly what you obtain.
So, a solution is to forbid the presence of another
<HDR>in the parts described by.*?that is too permissive.You have two technics to do that, you can replace the
.*?with:or with:
Most of the time, the first technic is more performant, but if your string contains an high density of
<, the second way can give good results too.The use of a possessive quantifier or an atomic group can reduce the number of steps to obtain a result in particular when the subpattern fails.
Example:
With the first way:
or this variant:
With the second way:
or this variant: