Regex to select string only between specific strings, uninclusive

139 Views Asked by At

Can someone help - need a regular expression that will select any and all STRING that occur ONLY between STRINGA and STRINGB, regardless of line break. I've tried researching this without success. Other "between two strings" queries on here haven't been helpful.

Specifically, I need to select ONLY tags (including the lt gt symbols) ONLY between h3 tags.

<p>  asdf <strong> ghkjk 
   <strong> qwer </p>
<h3> asdf **<strong>** gh
   kjk **<strong>** qwer </h3>

I can make it select ONLY all tags; and I can make it select the full sequence of the <h3> and </h3> tags. But I can't see how to combine those two conditions. (btw, regexr.com is a great tool!) Thanks.

2

There are 2 best solutions below

0
On

HTML is not easy to work with when it comes to regex and it can almost always fail in specific (and rare) cases. However, extracting info from tags is, for the most part, possible. Extracting info from a table (which then can have another table inside of it) is where regex starts to crumble.

I came up with (?<=\<[hH]3\>)(.|\s)*?(?=\<\/[hH]3\>), which solves the simple situation (<h3>info</h3>)

https://regex101.com/r/qK6uT4/1

Note that this won't work like it is in javascript due to a positive lookbehind. The idea is that it checks if there's a h3 tag before and a /h3 tag after with lookarounds.

(.|\s)*? means that any symbol and any space including newlines occur the minimum amount of times (so you don't get symbols between one <h3> tag and the </h3> of another one.

To deal with situations like <h3 class="someclass">, if they are relevant, the previous regex fails.

(?:\<[hH]3(?:\s.*?)?\>)((.|\s)*?)(?=\<\/[hH]3\>)

can be used and capture groups ($1) would be your results.

https://regex101.com/r/qK6uT4/3

0
On

regex for detecting if a tablet comes with 4G/3G/LTE

var txt1 = document.querySelector('.product-title').textContent.trim();
var ans;   

if(txt1.match(/\b4G\b/gi) || txt1.match(/\bLTE\b/gi) || txt1.match(/\b3G\b/gi) ||txt1.match(/ Cellular/gi)||txt1.match(/ Cell /gi) || txt1.match(/ 4G /g)  || txt1.match(/ LTE /g) || txt1.match(/ 3G /g) || txt1.match(/\b4G\b,/g)|| txt1.match(/\b3G\b,/g))  
{
ans= '1';
} 
else 
{
ans= '0';
}