calibre search & replace but maintain a single character

1.1k Views Asked by At

I am trying to remove many

</p><p class="calibre1">

but ONLY when this string is immediately followed by a lower case letter in Calibre. Replace it by '' followed by the lower case letter. (When there are uppercase letters or numbers or anything else then the string should stay...) In regex, case-sensitive mode I can locate these strings easily with this regex:

</p><p[^>]*>[a-z].....

BUT, I need of course to replace to lower case letter also with what it was before.. Is there a neat way to do this, or do I need to write my own regex-function for that??

2

There are 2 best solutions below

4
On BEST ANSWER

You are looking for a positive lookahead.

Search for </p><p[^>]*>(?=[a-z]) and replace it with [empty string].

The (?=[a-z]) ensures that it matches only if followed by a lower case letter, but it doesn't consume said letter, removing only the </p><p[^>]*> part.


Update: you may get problems with new line characters, please take a look @AFK's answer in that case.

0
On

I used @Fabian N.'s solution in Calibre to clean up after converting PDFs to EPUB files. I had to modify just a bit by adding a newline (e.g., \n) between the closing paragraph tag at the end of one line and the subsequent opening tag at the beginning of the next line as seen here:

</p>\n<p class="calibre1">(?=[a-z])

I would have thought that the (</p>) would have encompassed the newline (\n), but it wasn't matching in Calibre until I added the newline.

Thanks Fabian for the bit about the positive lookahead; just what I needed.