Java Regex Metacharacters returning extra space while spliting

111 Views Asked by At

I want to split string using regex instead of StringTokenizer. I am using String.split(regex); Regex contains meta characters and when i am using \[ it is returning extra space in returning array.

import java.util.Scanner;
public class Solution{
    public static void main(String[] args) {
        Scanner i= new Scanner(System.in);
        String s= i.nextLine();
        String[] st=s.split("[!\\[,?\\._'@\\+\\]\\s\\\\]+");
        System.out.println(st.length);
        for(String z:st)
            System.out.println(z);
        }
}

When i enter input [a\m] It returns array length as 3 and

 a m  

Space is also there before a. Can anyone please explain why this is happening and how can i correct it. I don't want extra space in resulting array.

2

There are 2 best solutions below

6
On BEST ANSWER

Since the [ is at the beginning of the string, when split removes [, there appear two elements after the first split step: the empty string that is at the beginning of the string, and the rest of the string. String#split does not return trailing empty elements only (as it is executed with limit=0 by default).

Remove the characters you split against from the start (using a .replaceAll("^[!\\[,?._'@+\\]\\s\\\\]+", note the ^ at the beginning of the pattern). Here is a sample code you can leverage:

String[] st="[a\\m]".replaceAll("^[!\\[,?._'@+\\]\\s\\\\]+", "")
                 .split("[!\\[,?._'@+\\]\\s\\\\]+");
System.out.println(st.length);
for(String z:st) {
    System.out.println(z);
}

See demo

0
On

As an addition to Wiktor Stribiżew’s answer, you may do the same without having to specify the pattern twice, by dealing with the java.util.regex package directly. Removing this redundancy may avoid potential errors and may also be more efficient as the pattern doesn’t need to be parsed twice:

Pattern p = Pattern.compile("[!\\[,?\\._'@\\+\\]\\s\\\\]+");
Matcher m = p.matcher(s);
if(m.lookingAt()) s=m.replaceFirst("");
String[] st = p.split(s);
for(String z:st)
    System.out.println(z);

To be able to use the same pattern, i.e. without having to use the anchor ^ for removing a leading separator, we first check via lookingAt() whether the pattern really matches at the beginning of the text before removing the first occurrence. Then, we proceed with the split operation, but reusing the already prepared Pattern.


Regarding your issue mentioned in a comment, the split operation will always return at least one element, the input string, when there is no match, even when the string is empty. If you wish to have an empty array then, the only solution is to replace the result explicitly:

if(st.length==1 && s.equals[0]) st=new String[0];

or, if you only want to treat an empty string specially, you may check this beforehand:

if(s.isEmpty()) st=new String[0];
else {
  // the code as shown above
}