Java split regex non-greedy match not working

1.4k Views Asked by At

Why is non-greedy match not working for me? Take following example:

public String nonGreedy(){
   String str2 = "abc|s:0:\"gef\";s:2:\"ced\"";
   return str2.split(":.*?ced")[0];
}

In my eyes the result should be: abc|s:0:\"gef\";s:2 but it is: abc|s

2

There are 2 best solutions below

2
On BEST ANSWER

The .*? in your regex matches any character except \n (0 or more times, matching the least amount possible).

You can try the regular expression:

:[^:]*?ced

On another note, you should use a constant Pattern to avoid recompiling the expression every time, something like:

private static final Pattern REGEX_PATTERN = 
        Pattern.compile(":[^:]*?ced");

public static void main(String[] args) {
    String input = "abc|s:0:\"gef\";s:2:\"ced\"";
    System.out.println(java.util.Arrays.toString(
        REGEX_PATTERN.split(input)
    )); // prints "[abc|s:0:"gef";s:2, "]"
}
3
On

It is behaving as expected. The non-greedy match will match as little as it has to, and with your input, the minimum characters to match is the first colon to the next ced.

You could try limiting the number of characters consumed. For example to limit the term to "up to 3 characters:

:.{0,3}ced

To make it split as close to ced as possible, use a negative look-ahead, with this regex:

:(?!.*:.*ced).*ced

This makes sure there isn't a closer colon to ced.