I'm working on a program that calculates a hit-percentage between two strings (A and B). To get an accurate percentage I'm matching N-Grams with a list of strings that are permutations of String A.
Here's my code
public String[] generatePermutations( String name ){
String[] perms = new String[ calcN(name.length()) ];
int nameLen = name.length(),
cnt = 0;
for(int i = 0; i < name.length(); i++ ){
nameLen = name.length()-i;
for( int ii = 0; ii <= i; ii++){
perms[cnt++] = name.substring( ii, ii + nameLen );
}
}
return perms;
}
for reference calcN() is below
public int calcN( int n ){
return ( n * (n+1 )) / 2;
}
Given a String "ABC" this method will generate
{ "A", "B", "C", "AB", "BC", "ABC" }
Since I'm doing this operation thousands of times (perhaps hundreds of thousands) is there any way I can squeeze a few extra cycles out of my CPU? (besides switching to C++ or C). As always, thanks in advance for the advice!
The performance optimization of the method depends in part on the JVM in use. For example, in OpenJDK substring is implemented as:
That string constructor is a protected form that is implemented as:
Note that this doesn't need to create a new value (the char[] that backs the String).
On the other hand, as described in Java 7 Performance Tuning Guide this was removed because of a memory leak (the single character substring of a 1000 character long string that gets garbage collected still retains the 1000 character string backing it).
And so, the choice of which jvm you are using could have a significant effect upon the creation of strings from substrings.
Depending on your application, and how critical that performance is, you might consider implementing your own limited version of the String that reimplements the offset and length implementation for substring.