Java regex matcher.matches() does not work on positive lookahead pattern, should I use find() method instead?

66 Views Asked by At

I need a java regex pattern to validate input String: the input can containt 3 or more letters, followed by 7 or more digits. The sum of the characters should be between 10 and 14.

I wrote a pattern, and tested working, I realized this with 2 sections: 1 positive lookahead that checks for characters format (3 or more letters followed by 7 or more numbers) 2 positive lookahead checks for input string character length in mass

My pattern: (?=^[A-Z]{3,}[0-9]{7,}$)(?=^[A-Z0-9]{10,14}$)

When I use in java8 with Matcher.matches(), it does not match instead if I use matcher.find(), it gives me true.

I tried this pattern: (?=^[A-Z]{3,}[0-9]{7,}$)(?=^[A-Z0-9]{10,14}$) with Matcher.matches() and was expecting to give me true, but give me false.

If I try this pattern with matcher.find(), it gives me true, but I also have other patterns in use, and that don`t have start and end sign, so find() function gives true for that pattern (gives wrong result) if the input string contains other characters too (so I would not use find because other patterns if not neccessarry).

input should work: ROM1234567 ROMM1234567 ROM123456789

input should not work: RO1234567 RO123456 ROM123456 ROM123456789012

2

There are 2 best solutions below

2
markalex On

Matcher.matches() checks if full string matches provided pattern. But you pattern doesn't actually matches anything: lookaheads (and lookarounds in general) do not consume input.

You can either use pattern that actually matches string. Like this:

^(?=[A-Z]{3,}[0-9]{7,}$)[A-Z0-9]{10,14}$

or

^(?=[A-Z]{3,}[0-9]{7,}$)(?=[A-Z0-9]{10,14}$).*

Demo of the first example here. Notice, how it matches full line, instead of empty string in the beginning, like your attempt did it.

Or use matcher.find() since it looks for substring and perfectly happy with pattern that matches empty string in the beginning of the input.

0
Booboo On

You night try as your regex:

^(?=.{10,14}$)[A-Z]{3,}[0-9]{7,}\Z
  1. ^ - Matches start of string.
  2. (?.{10,14}$) - Positive lookahead assertion that the string contains from 10 to 14 non-newline characters.
  3. [A-Z]{3,}[0-9]{7,} - Matches 3 or more alpha followed by 7 or more digits.
  4. \Z - Matches the end of string.

Note that in I have used \Z instead of $, which also will match a newline character at the end of the string, which presumably you do not want as part of the input. That is, the input should consist exclusively of alphanumeric characters. If you know that a newline character cannot be entered or one at the end of the line is acceptable, then use $ instead.