Group in regex that matches every substring that doesnt start with a specific character

112 Views Asked by At

I am trying to write a group in a regex that matches every substring except for the ones that start with a ' " '

The long story short of my regex is : something that starts with 2 personal names and ends with a 10-digit id. Mainly I have 3 groups: the names, the middle part and the ID.

So it has to match

Jennifer Ann from New York, "Wisdom" str, bl. 54, В, with id 1234567890

in

her name was Jennifer Ann from New York, "Wisdom" str, bl 54, B, with id 1234567890 which is very rare

but not to match anything in :

her name was Jennifer Ann" from New York, "Wisdom" str, bl 54, B, with id 1234567890 which is very rare

because of the quotes after Ann. Right now my middle part group looks like this:

(?'compositeMiddle'.*?) which matches everything. I want to make it match everything except for the substrings that start with: "

2

There are 2 best solutions below

2
On BEST ANSWER

Seems like you want something like this,

^[A-Z][a-z]+\s[A-Z][a-z]+(?:[^"']|"[^"]*"|'[^']*')*?\b\d{10}$

DEMO

(?:[^"']|"[^"]*"|'[^']*')*? first the regex engine takes this [^"'] then combined it with the following * and tries to match any character but not of ' or " zero or more times. If it founds a double quotes, the first pattern [^"'] got failed and the regex engine chooses the next that is, "[^"]*". This pattern would matches the strings like "foo", "bar" etc. If it founds an ' symbol then the control transfers to the third pattern '[^']*'. So it matches only the properly quoted strings if they present.

2
On
(?'compositeMiddle'[^"].*)

The [abc] pattern is a collection of matching characters, and it can be negated by the ^.