Regular expression for matching either or

494 Views Asked by At

I am doing this in Notepad++

Here's how my data looks like

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|


My regular expression: ^([N|Y]\|.*\|.*[^\x00-\x7F].*\|.*[^\x00-\x7F].*\|)

which is matching only the rows that have that UTF characters in both the first and lastname.
Is not showing if either name has that character.

How to get that?

2

There are 2 best solutions below

1
On BEST ANSWER

^[NY]\|\d{5}\|(?:[\w_]+[^\x00-\x7F]?[\w_]+\|){2}(?:[\w_]+[\x00-\x7F]?[\w_]+\|){2}$

matches:

N|12345|JOHN|TAKÁCSI|blah|blah|
N|12466|PÉTER|VÁLI|blah|blah|
Y|45645|SÁNDAR|SÁKU|blah|blah|
N|89789|DÓRA|MERRY|blah|blah|

does not match:

N|89789|DÓRA|MERRY|blah|blÓh|
N|89789|DoRA|MERRY|blaÓh|blah|
N|89789|DoRA|MERRY|blaÓh|blÓah|

You were checking for both to have UTF characters, I changed it to only need to match one, the other is not mandatory now. I have also used parts of @HamZa's answer below to modify this answer to suit your data set and wants.

0
On

You could just use : ^[NY]\|\d+(?:\|[^\W_]+){4}\|$

Explanation:

  • ^ : match begin of line
  • [NY] : match either N or Y. You should not use [N|Y] since that will also make it match a pipe |
  • \| : match a pipe |
  • \d+ : match one digit or more
  • (?: : non capturing group
    • \| : match a pipe |
    • [^\W_]+ : We could use \w which will match alphanumeric characters, but _ will also be included. So to not match _ we just inverse it.
  • ){4} : end of group, and repeat it 4 times.
  • \| : match a pipe |
  • $ : match end of line