Notepad++ and regex - how to title case string between two particular strings?

661 Views Asked by At

I have hundreds of bib references in a file, and they have the following syntax:

@article{tabata1999precise,
  title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts. 
Control of solid structure and $\pi$-conjugation length},
  author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
  journal={Macromolecular chemistry and physics},
  volume={200},
  number={2},
  pages={265--282},
  year={1999},
  publisher={Wiley Online Library}
}

I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.

I am able to find all instances using:

(?<=journal\=\{).*?(?=\})

but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.

Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).

So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.

Thank you in advance, I'd appreciate any help.

2

There are 2 best solutions below

5
On BEST ANSWER

This works for any number of words:

  • Ctrl+H
  • Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
  • Replace with: \u$1\E$2$3
  • CHECK Wrap around
  • CHECK Regular expression
  • Replace all

Explanation:

(?:             # non capture group
    journal={     # literally
  |              # OR
    \G            # restart from last match position
)               # end group
\K              # forget all we have seen until this position
(?:             # non capture group
    (\w{4,})      # group 1, a word with 4 or more characters
  |              # OR
    (\w+)         # group 2, a word of any length
)               # end group
(\h*)           # group 3, 0 or more horizontal spaces

Replacement:

\u          # uppercased the first letter of the following
  $1        # content of group 1
\E          # stop the uppercased
$2          # content of group 2
$3          # content of group 3

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

1
On

if the format is always in the form:

journal={Macromolecular chemistry and physics},

i.e. journal followed by 3 words then use the following:

Find: journal={(\w+)\s*(\w+)\s*(\w+)\s*(\w+)

Replace with: journal={\u\1 \u\2 \l\3 \u\4

You can modify that if you have more words to replace by adding more \u\x, where x is the position of the word.

Hope it helps to give you an idea to move forward for a better solution.

enter image description here

\u translates the next letter to uppercase (used for all other words)

\l translates the next letter to lowercase (used for the word "and")

\1 replaces the 1st captured () search group

\2 replaces the 2nd captured () search group

\3 replaces the 3rd captured () search group