Find and exclude html-tags as whole words in negative lookbehind with regex

40 Views Asked by At

I basically try to find all paragraphs (in javascript/jquery) in a text, that are not yet wrapped in a set of defined html-tags:

p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe

My current regex (https://regex101.com/r/O4i2hP/1) already matches paragraphs and excludes the defined tags

(.+?(?<![</(p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe)>]$))(\n|$)+/gm

but I just don't get, how to just match whole tags only.

The problem is:

(p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe)> matches a single character in the list (p|h123456blockquteimgafr)> (case sensitive)

Thus, as you can see from the example, code that is wrapped in tags such as <strong>TEXT</strong> is also excluded.

I tried different things such as word boundaries \bword\b, but didn't get it working. I hope you can help. Thx

2

There are 2 best solutions below

1
John Williams On BEST ANSWER

This will do it.

^(?!<(p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe)+?>.</\1>).$

0
Nixen85 On

I now found a working approach. The tags should be wrapped in groups rather than in character classes. The following works for me:

(.+?(?<!(<\/)(p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe)(>)$))(\n|$)+/gm

see also: https://regex101.com/r/DC5msM/1