How to Grep Search two occurrences of a character in a lookbetween

738 Views Asked by At

I seem to have to perpetually relearn Regex & Grep syntax every time I need something advanced. This time, even with BBEDIT's pattern playground, I can't work this one out.

I need to do a multi-line search for the occurrence of two literal asterisks anywhere in the text between a pair of tags in a plist/XML file.

I can successfully construct a lookbetween so:

(?s)(?<=<array>).*?(?=</array>)

I try to limit that to only match occurrences in which two asterisks appear between tags:

(?s)(?<=<array>).*?[*]{2}.*?(?=</array>)
(?s)(?<=<array>).+[*]{2}.+(?=</array>)
(?s)(?<=<array>).+?[*]{2}.+?(?=</array>)

But they find nought. And when I remove the {2} I realize I'm not even constructing it right to find occurrences of one asterisk. I tried escaping the character /* and [/*] but to no avail.

How can i match any occurrence of blah blah * blah blah * blah blah ?

2

There are 2 best solutions below

0
On BEST ANSWER

[*]{2} means the two asterisks must be consecutive.

(.*[*]){2} is what you're looking for - it contains two asterisks, with anything in between them.

But we also need to make sure the regex is only testing one tag closure at the same time, so instead of .*, we need to use ((?!<\/array>).)* to make sure it won't consume the end tag </array> while matching .*

The regex can be written as:

(?s)(?<=<array>)(?:((?!<\/array>).)*?[*]){2}(?1)*

See the test result here

2
On

Use

(?s)(?<=<array>)(?:(?:(?!<\/?array>)[^*])*[*]){2}.*?(?=</array>)

See proof.

Explanation

NODE EXPLANATION
(?s) set flags for this block (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally)
(?<= look behind to see if there is:
  <array> '<array>'
) end of look-behind
(?: group, but do not capture (2 times):
(?: group, but do not capture (0 or more times (matching the most amount possible)):
(?! look ahead to see if there is not:
</?array> </array> or <array>
) end of look-ahead
[^*] any character except: '*'
)* end of grouping
[*] any character of: '*'
){2} end of grouping
.*? any character (0 or more times (matching the least amount possible))
(?= look ahead to see if there is:
</array> '</array>'
) end of look-ahead