Regex to capture everything except the text that is coherent

253 Views Asked by At

I have this string and other ones like it:

<a href='/webapps/alrn-atomiclearning-bb_bb60/atomic/view.jsp?courseId=@[email protected]_string@X@&contentId=@[email protected]_string@X@&tt=Using+the+course+calendar&st=Blackboard+Learn%E2%84%A2+9.1+Instructor+-+Additional+Features+Training&d=00:02:09&tid=84425&sid=2389'><img src='/webapps/alrn-atomiclearning-bb_bb60/images/icon_play_UnlockedTutorial.png' alt='play icon'>&nbsp;Using the course calendar</a><br/>Duration: (00:02:09)

I'm trying to come up with a regex to capture everything EXCEPT the coherent labels that begin after &nbsp; and end just before the </a><br/>

So for example, I would capture everything and then delete it and end up only having:

Using the course calendar

as still there. I've tried multiple variations in Rubular but can only get up to the  . Trying to use the [^a-zA-Z|^\s]*<\/a>.* to skip every word char and white space up to the <\a> does not work.

Thanks.

1

There are 1 best solutions below

6
On BEST ANSWER

Using a lookahead and a lookbehind - the two sections in brackets. Modify the character class in the middle to capture everything you want to select.

(?<=>&nbsp;)[a-zA-Z\s]+(?=<\/)

Edit:

([\s\w\d\S\W\D]+)((?<=>&nbsp;)[a-zA-Z\s]+(?=<\/))\K([\s\w\d\S\W\D]+)

Ultimately this creates three match groups, the bit before what you want to be left with, the bit you want to be left with, and the bit after what you want to be left with. I'm not sure how, or if indeed you can, specify to select multiple matches as if it's a single match.
I'd still go with the selecting what you're actually after, if possible.