I have a txt file with <i> and </i> between words that I would like to remove using Editpad
For example, I'd like to keep when it's like this:
<i>Phrases and words.</i>
And I'd like to remove the </i> and <i> tags inside the phrase, when it's like this:
<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>
I was trying to do that using regex, but I couldn't do it.
As the tag is followed by space or a word character I could find when the line has the double tag with
/ <i>|<\/i> /
but this way I can't just press replace for nothing, I have to edit line by line I search.
There's anyway to accomplish that?
* Edited *
Another example of lines found on the subtitle text
<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>
Rule number one: you can't parse html with regex.
That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)
If I've understood correctly, it looks like you can simply remove all
<i>and</i>that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)
Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.
update to work with Editpad
EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:
Try searching for the regex:
(.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the<i>tags.When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of
\1\2should work.