Correct string format after splitting and stemming

594 Views Asked by At

I have a text file and I am trying to use a stemmer.

A stemmer strips words of their suffixes. For example, "having had have" would be "have have have" after the stemming process. In order to do that, one has to split the string as the stemmer can only process one word at a time. After the splitting and the stemming, the output looks like this: "havehavehave." How is it possible to return it to the right format?

englishStemmer english = new englishStemmer();

Scanner inputFile = new Scanner(file); //The text of file is "having have had" or something similar
String[] text = inputFile.nextLine(split("\\s"));


for (int i =0; i < text.length; i++){
    english.setCurrent.text([i]);
    english.stem();
    System.out.print(english.getCurrent())
}
3

There are 3 best solutions below

0
On BEST ANSWER

Assuming that the output you are looking at is what you print with System.out.print you just need to add an ending white space as in System.out.print(english.getCurrent() + " "); If you want to avoid a white space at the last iteration wrap it with an if statement as in:

if( i < text.length -1 )
{
    System.out.print(english.getCurrent() + " ");
} else {
    System.out.print(english.getCurrent());
}
1
On

You could pass the output of your stem() function to an output() function in which you could add a " " character after each word. You'd have to decide what data structure to hold the stemmed words in. Then you will get output like have have have rather than havehavehave.

1
On

How about saving all the stemmed words into an ArrayList ? Then you could iterate over the ArrayList and output as desired. borrowing from your code with some simple modifications:

englishStemmer english = new englishStemmer();
Scanner inputFile = new Scanner(file); //The text of file is "having have had" or something similar
String[] text = inputFile.nextLine(split("\\s"));
ArrayList<String> stemmedWords = new ArrayList<String>();

for (int i =0; i < text.length; i++){
    english.setCurrent.text([i]);
    english.stem();
    String stem = english.getCurrent();
    stemmedWords.add(stem);
}

for(String stem : stemmedWords){
 System.out.println(stem);
}

Alternatively,

for (int i =0; i < text.length; i++){
        english.setCurrent.text([i]);
        english.stem();
        System.out.print(english.getCurrent())
        System.out.print(" ");
    }
   System.out.println(); //Optionally adds a new line after one complete iteration