Is it possible to capture a block of nested HTML with PCRE RegEx?

109 Views Asked by Garry Pettet At 31 December 2016 at 08:03

Before you slate me, yes I know that you shouldn't parse HTML with regex, you should use a dedicated parser. I don't have that option in the language I'm using (Xojo) and for various reasons, I need to use RegEx.

I'm trying to capture an entire block of HTML that may or may not contain nested HTML elements. Examples:

<blockquote> This is a blockquote with two paragraphs. Lorem ipsum dolor sit amet,
 consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus.
 Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus.

 Donec sit amet nisl. Aliquam semper ipsum sit amet velit. Suspendisse
 id sem consectetuer libero luctus adipiscing.</blockquote>

 -----------------

<blockquote> This is the first level of quoting.

<blockquote> This is nested blockquote.</blockquote>
 Back to the first level.</blockquote>

 -----------------

<div>
Not nested
</div>

 -----------------

<div>
Top level
<div>Nested</div>
</div>

I had come up with this pattern: <(\w*)>([\S\s]*?)<\/\1> but whilst it works for blocks of HTML it fails if the block contains a block of HTML with the same tags as the parent block. Online example here.

I'm using the PRCE variant of RegEx and coding in Xojo.

Does anyone have any useful advice on how to solve this problem? Thank you.

Original Q&A

Is it possible to capture a block of nested HTML with PCRE RegEx?

There are 0 best solutions below

Related Questions in HTML

Related Questions in REGEX

Related Questions in PCRE

Trending Questions

Popular # Hahtags

Popular Questions