Java regex positive look-ahead but match unique characters only?

354 Views Asked by At

I'm trying to match a String input with the criteria below:

  1. The first characters are unique lowercase English letters
  2. The next characters are the represent the current year from 1500 to 2020
  3. The next characters can only be 10, or 100, or 1000
  4. The last character will be a digit 0 through 9

The regex string that I have created that I believe is mostly correct is with explanation is:

String validRegex = 
"^"+                                    # start of string
(?=.*[a-z].*[a-z].*[a-z])"+             # Ensure string has only 3 consecutive lowercase English letters
"(?=.*[0-9].*[0-9].*[0-9].*[0-9])"+     # Ensure string has only 4 digits representing year i.e. 2020
"(?=.*([0-9].*[0-9]) | ([0-9].*[0-9].*[0-9]) | ([0-9].*[0-9].*[0-9].*[0-9]))"+ # Ensure 10, 100, or 100 digits
"(?=.*[0-9])"+                          # Ensure last character is a digit 0-9
"(?=\\S+$)"+                             # Ensure string has no whitespace
".{10,12}"+                              # Entire string length must be from 10 through 12 characters
"$";                                     # end of string

Is there a simple way to update my regex expression such that I can detect for only unique consecutive characters?

2

There are 2 best solutions below

8
Wiktor Stribiżew On BEST ANSWER

Look:

  • The entire input (String) length will be from 10 to 12 characters always - ^.{10,12}$ (HOWEVER, in this case, you do not need to add this to the overall pattern because all parts below will sum up to 10, 11 or 12 chars allowed in the string)
  • The first 3 characters are UNIQUE lowercase English letters ([a-z]) - ^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z]
  • The next 4 characters are the represent the current year from 1500 to 2020, i.e. 2020 - (?:1[5-9][0-9]{2}|20[01][0-9]|2020)
  • The next characters can only be 10, or 100, or 1000 only (so at minimum 2 chars (i.e. 10), or at max 4 chars (i.e. 1000)) - [0-9]{2,4}
  • The last character will be a digit 0 through 9 - [0-9].

Joining these bits, you get

String regex = "^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9]$";

See the regex demo.

If you plan to support lower- and uppercase letter, add the case insensitive modifier (?i) at the start:

String regex = "(?i)^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9]$";

If there can be a letter at the end, not just a digit, you may use

String regex = "(?i)^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9a-z]$";

See this regex demo.

To create regex number ranges, you may use such well-known services as gamon.webfactional.com or richie-bendall.ml, or MyRegexTester.com.

See the Java demo:

String regex = "(?i)(([a-z])(?!\\2)([a-z])(?!\\2|\\3)[a-z])(1[5-9][0-9]{2}|20[01][0-9]|2020)([0-9]{2,4})([0-9a-z])";
String s = "AVG190420T";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
    System.out.println("Part 1: " + matcher.group(1));
    System.out.println("Part 2: " + matcher.group(4));
    System.out.println("Part 3: " + matcher.group(5));
    System.out.println("Part 4: " + matcher.group(6));
} else {
    System.out.println(s + " does not match the pattern.");
}

Output:

Part 1: AVG
Part 2: 1904
Part 3: 20
Part 4: T
2
Nowhere Man On

The following regexp does not use lookaheads but it seems to be validating better by the initial requirements:

^(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz)(1[5-9]\d{2}|20[0-1]\d|2020)10{1,3}\d$

Online demo

The 1st group (abc|bcd|...|xyz) validates unique consecutive lowercase letters.

The 2nd group validates year: (1[5-9]\d{2}|20[01]\d|2020) match year from 1500 to 2020

The remaining digital suffix is validated:

  • 10{1,3} match 10, 100 or 100
  • \d match the closing digit

Update
For the year range 1900..2019 the pattern is (19\d{2}|20[01]\d) For the digits like 10, 20, 50, 100, 200, 500, 1000, the pattern is (10{1,3}|[25]0{1,2})

Updated online demo