Removing text if part of it appears only once using Notepad++

49 Views Asked by At

I'm able to delete text using Regex, but is there a way to do it conditionally if a particular part of the text appears only once in the file?

For example, I will get all the following results if I search flag_abc=.*\n

flag_abc=10000001
flag_abc=10000002
flag_abc=10000003
flag_abc=10000004
flag_abc=10000005
flag_xyz=10000005
flag_abc=10000006
flag_abc=10000007

10000001, 10000003, 10000004 and 10000006 can only be found once, which are only in the lines containing flag_abc= However, 10000002, 10000005 and 10000007 can be found in more than 1 line. The number of lines of code is not consistent. flag_abc= is always the same format and the number is always an 8 digit number. The original code would look like:

<lines of code>
flag_abc=10000001
<lines of code>
flag_abc=10000002
<lines of code>
property_ghi=10000002
<lines of code>
flag_abc=10000003
<lines of code>
flag_abc=10000004
<lines of code>
flag_abc=10000005
<lines of code>
flag_uvwxyz=10000005
<lines of code>
flag_abc=10000006
<lines of code>
flag_abc=10000007
<lines of code>
10000007{}
<lines of code>

I'm trying to delete all instances of flag_abc=xxxxxxxx where xxxxxxxx only appears once, which would be only next to "flag_abc=". If xxxxxxxx appears next to "flag_abc=" but also anywhere else in the code regardless of location in the code, then leave that line alone. So the above code should end up looking like:

<lines of code>
<lines of code>
flag_abc=10000002
<lines of code>
property_ghi=10000002
<lines of code>
<lines of code>
<lines of code>
flag_abc=10000005
<lines of code>
flag_uvwxyz=10000005
<lines of code>
<lines of code>
flag_abc=10000007
<lines of code>
10000007{}
<lines of code>

I looked through the Searching part of the manual for NPP but I was unable to find any expressions that checked the uniqueness of a string. Is this even doable using the search expressions?

1

There are 1 best solutions below

0
Toto On
  • Ctrl+H
  • Find what: ^flag_abc=(\d{8})\b[\s\S]*?\b\1\b(*SKIP)(*FAIL)|^flag_abc=\d{8}\R
  • Replace with: LEAVE EMPTY
  • TICK Wrap around
  • SELECT Regular expression
  • UNTICK . matches newline
  • Replace all

Explanation:

  ^               # beginning of line
    flag_abc=       # literally
    (\d{8})         # group 1, 8 digits
    \b              # word boundary, not matching 9 digits
    [\s\S]*?        # 0 or more any character, not  greedy
    \b              # word boundary
    \1              # backreference to group 1, same 8 digit number
    \b              # word boundary
    (*SKIP)         # skip this match
    (*FAIL)         # and considere it failled
|               # OR
  ^               # beginning of line
    flag_abc=       # literally
    \d{8}           # 8 digits
    \R              # any kind of linebreak

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here