I was solving Leetcode 1044 and the answer is using binary search and rolling hash. Basically use binary search to select a length and then do a search for duplicate string of that length. Here rolling hash comes into play to save space (instead of using a set to store all substring, we store substring's hash). That is the background for the solution.
My question is in terms of the modulus used to prevent overflow. I chose Long.MAX_VALUE which I believe is big enough to handle it but the answer is not correct when I use Long.MAX_VALUE. However, when I use Long.MAX_VALUE / 26 or Math.pow(2, 32), they both work. Sorry I'm pretty bad about modulus and I think I definitely missed some things here. Could anyone shed some light on it? Thanks! The following is my solution:
public static long modulus = Long.MAX_VALUE / 26;
public String longestDupSubstring(String S) {
int n = S.length();
int l = 1;
int r = n - 1;
int index = -1;
while (l <= r) {
int m = l + (r - l) / 2;
int temp = findDuplicate(S, m);
if (temp != -1) {
index = temp;
l = m + 1;
}
else {
r = m - 1;
}
}
return index == -1 ? "" : S.substring(index, index + r);
}
private int findDuplicate(String s, int len) {
Set<Long> set = new HashSet<>();
long hash = 0;
long p = 1;
for (int i = 0; i < len; i++) {
hash = (hash * 26 + s.charAt(i) - 'a') % modulus;
p = (p * 26) % modulus;
}
set.add(hash);
for (int i = len; i < s.length(); i++) {
hash = (hash * 26 + (s.charAt(i) - 'a')
- (s.charAt(i - len) - 'a') * p) % modulus;
if (hash < 0) {
hash += modulus;
}
if (set.contains(hash)) {
return i - len + 1;
}
set.add(hash);
}
return -1;
}
26 is not part of the modulus, is part of hashing. If we would separate those in the algorithm, then we might see how it'd work. For modulus usually a large number would simply suffice, does not have to be a
long
: