Regular Expressions - Select the Second Match

470 Views Asked by At

I have a txt file with <i> and </i> between words that I would like to remove using Editpad

For example, I'd like to keep when it's like this:

<i>Phrases and words.</i>

And I'd like to remove the </i> and <i> tags inside the phrase, when it's like this:

<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>

I was trying to do that using regex, but I couldn't do it.

As the tag is followed by space or a word character I could find when the line has the double tag with

/ <i>|<\/i> /

but this way I can't just press replace for nothing, I have to edit line by line I search.

There's anyway to accomplish that?

* Edited *

Another example of lines found on the subtitle text

<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>
1

There are 1 best solutions below

5
AudioBubble On BEST ANSWER

Rule number one: you can't parse html with regex.

That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)

If I've understood correctly, it looks like you can simply remove all <i> and </i> that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:

(?<=.)\<\/?i\>(?=.)

This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)

Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.

update to work with Editpad

EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:

Try searching for the regex: (.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the <i> tags.

When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of \1\2 should work.