Trim() vs IndexOf()

515 Views Asked by At

I am parsing 100 of files which contains 1000 of lines in it.

I have to check whether line starts with some keywords.

i have 2 options not sure which to consider.

option 1:

    String[] keywordsArr = { "Everything", "Think", "Result", "What", "#Shop",  "#Cure" };
    for (int i = 0; i < linesOfCode.length; i++) {

        for (String keyWord : keywordsEndingAtEndOfLogicalLine) {

            if (linesOfCode[i].indexOf(keyWord) > -1) {

                if (linesOfCode[i].trim().startsWith(keyWord)) {

                    linesOfCode[i] = "";
                    break;
                }
            }
        }
    }

option 2:

String[] keywordsArr = { "Everything", "Think", "Result", "What", "#Shop",  "#Cure" };
    for (int i = 0; i < linesOfCode.length; i++) {

        for (String keyWord : keywordsArr) {

            if (linesOfCode[i].trim().startsWith(keyWord)) {

                    linesOfCode[i] = "";
                    break;
            }
        }
    }

frequency of line starting with Keywords is 1 in 100.

3

There are 3 best solutions below

1
On

Try using continue instead of break. Instead of stopping the loop, continue will tell the program to go one level up, thus continuing the loop for the next item.

0
On

There is little point scanning the entire string for a keyword just to avoid looking for the keyword at the beginning of the string. If the idea was to avoid an expensive trim, then it might be reasonable to use a cheaper technique to find the first token in the line.

Note that the startsWith comparison can produce false positives in the case that the line starts with a word whose prefix is a keyword. For example, if the keyword were break, a code line such as:

breakfast = "ham and eggs";

would be incorrectly eliminated.

You might want to investigate using StringTokenizer to extract the first word in the string, or even better, use a regular expression.

0
On

This is something regexes are really good for. You code is equivalent to

for (int i = 0; i < linesOfCode.length; ++i) {
    linesOfCode[i] = linesOfCode.replaceAll(
        "^\\s+(Everything|Think|Result|what|#Shop,#Cure).*", "");
}

but you might require word boundary (\\b) after the keyword. For more speed, you should compile your regex like

private static final Pattern PATTERN = Pattern.compile(
    ^\\s+(Everything|Think|Result|what|#Shop,#Cure)\\b");

for (int i = 0; i < linesOfCode.length; ++i) {
    if (Pattern.matcher(linesOfCode[i]).matches()) {
        linesOfCode[i] = "";
    }
}