gog+cat I want to match specific symbols _!=+, but not in th" /> gog+cat I want to match specific symbols _!=+, but not in th" /> gog+cat I want to match specific symbols _!=+, but not in th"/>

Matching specific characters individually except in specific expression

55 Views Asked by At

Let's say I have a string:
cat_dog_ !mouse= <name="Jake_Russell!"> gog+cat

I want to match specific symbols _!=+, but not in <name="Jake_Russell!"> that part of this regex <name=\".+\">. So result should be __!=+

I've tried lookAhead:
(?!<name=\".+\">)([_!=+])
but as a result, it matches symbols in <name="Jack_Russell!"> too.

3

There are 3 best solutions below

1
Trung Duong On BEST ANSWER

I think you could try capturing groups, capture part <name=\".+\"> into 1 ignored group, and another group for matched specific symbols.

Regex patten: (?<ignored_group><name=".+">)|(?<matched_group>[_!=+])

See demo here

0
Bohemian On

Because variable length look behinds are not supported, you can't exclude matches that appear after particular text.

However, you can exclude a match immediately after <name and exclude matches within quotes, which is the best you can do given the limitations of regex:

(?<!<name)[_!=+](?=(([^"]*"){2})*[^"]*$)

See live demo.

0
The fourth bird On

You can rule out what you don't want, and then capture when you want using an alternation and a capture group:

<name="[^"]*">|([_!=+])

Explanation

  • <name= Match literally
  • "[^"]*" Negated character class, match from "..."
  • > Match literally
  • | Or
  • ([_!=+]) Capture group 1, match any of the listed characters

Regex demo

If there can be more than name= and no more occurrences of < and > you might also use:

<[^<>]*\bname="[^"]*"[^<>]*>|([_!=+])

Regex demo