Buffered reader - remove punctuation

920 Views Asked by At

I need help with reader, which will remove punctuation and numbers and will create array of strings out of the input.

For example, on the input, there will be "example.txt" file which will contain something like this:

Hello 123 , I'am new example ... text file!"

I need my reader to create array which will contain this:

String[] example = {"Hello", "I", "am", "new", "example", "text", "file"}

Is there a way how to remove punctuation and numbers and create array of strings with buffered reader?

Thank you in advance, Fipkus.

3

There are 3 best solutions below

0
On BEST ANSWER

In the end, I fixed it like this:

char[] alphabet= {'a','á','b','c','č','d','ď','e','é','ě','f','g','h',
            'i','í','j','k','l','m','n','ň','o','ó','p','q','r','ř','s','š','t','ť',
            'u','ú','ů','v','w','x','y','ý','z','ž','A','Á','B','C','Č','D','Ď','E','É',
            'Ě','F','G','H','I','Í','J','K','L','M','N','Ň','O','Ó','P','Q','R','Ř','S','Š','T',
            'Ť','U','Ú','Ů','V','W','X','Y','Ý','Z','Ž',' '};



                String vlozena = userInputScanner.nextLine();
                String fileContentsSingle = "";
                Integer lenght = vlozena.length();
                int j ;
                char cha;

                        /*
                         * kontroluje, zda se jedná o mezeru či písmeno české abecedy
                         * a poté jej přidá, pokud vyhovuje, do věty
                         */
                for (j = 0; j<lenght;j++) {
                    cha = vlozena.charAt(j);
                    for (char z : abeceda) {
                        if (cha == z) {
                            fileContentsSingle = fileContentsSingle + cha;
                        }
                    }
                }

                fileContentsSingle = fileContentsSingle.replaceAll("\\s+", " ");
                fileContentsSingle = fileContentsSingle.toLowerCase();
                String[] vetaNaArraySingle = fileContentsSingle.split("\\s+",-1);
2
On

Another method is using StringTokenizer. It's a little more restrictive, but I prefer it since you just list the delimiters instead of regex, which is a little easier to read.

String test = "Hello 123 , I'am new example ... text file!";
ArrayList<String> exampleTemp = new ArrayList<>();
String[] example = new String[6];

StringTokenizer st = new StringTokenizer(test, " ,.1234567890!");
while(st.hasMoreTokens()) 
{
    exampleTemp.add(st.nextToken());
} 
exampleTemp.toArray(example);

for(String word : example)
{
    System.out.println(word);
}

Edit: I modified it to fill a String array. Not sure about the white space issue.

1
On

Use String.split(regex). In String regex, you put the characters you have to remove like in String regex = ",0123456789\\.".