I need this String only... How do I write Regular Expression (for Rain" /> I need this String only... How do I write Regular Expression (for Rain" /> I need this String only... How do I write Regular Expression (for Rain"/>

Regular Expression to exclude a String around the required String

120 Views Asked by At

In between a HTML code:

...<div class="..."><a class="..." href="...">I need this String only</a></div>...

How do I write Regular Expression (for Rainmeter which uses Perl RegEx) such that:

-required string "I need this String only" is grouped to be extracted,

-the HTML link tag <a>...</a> might be absent or present & can be present in between the required string and multiple times as well.

My attempt:

(?siU) <div class="...">.*[>]{0,1}(.*)[</a>]{0,1}</div> where:

.*= captures every characters except newline{<a class ... "}
[>]{0,1}= accepts 0 or 1 times presence of > {upto >}
(.*)= captures my String
[</a>]{0,1}= accepts 0 or 1 times presence of </a> 

this, of course, doesn't work as I want, This gives output with HTML linking preceding my string so my question is

How to write a better(and working) RegEx?

1

There are 1 best solutions below

0
joanis On

Even though I agree with the advice to use a real parser for this problem, this regular expression should solve your problem:

<div [^.<>]|*>(?:[^<>]*<a [^<>]*>)*([^<>]*)(?:</a>)*</div>

Logic:

  • require <div ...> at the beginning and </div> at the end.
  • allow and ignore <a ...> before the matched text arbitrarily many times
  • allow and ignore </a> after the matched text arbitrarily many times
  • ignore any text before any <a ...> with [^<>]* in front of it. Using .* would also work, but then it would skip all text arbitrarily up to the last instance of <a ...> in your string.
  • I use [^<>]* instead of .* to match non-tag text in a protected way, since literal < and > are not allowed.
  • I use (?:...) to group without capturing. If that is not supported in your programming language, just use (...) instead, and adjust which match you use.

Caveat: this won't be fully general but should work for your problem as described.