Leetcode 1044. Longest Duplicate Substring (small question in terms of modulus)

379 Views Asked by At

I was solving Leetcode 1044 and the answer is using binary search and rolling hash. Basically use binary search to select a length and then do a search for duplicate string of that length. Here rolling hash comes into play to save space (instead of using a set to store all substring, we store substring's hash). That is the background for the solution.

My question is in terms of the modulus used to prevent overflow. I chose Long.MAX_VALUE which I believe is big enough to handle it but the answer is not correct when I use Long.MAX_VALUE. However, when I use Long.MAX_VALUE / 26 or Math.pow(2, 32), they both work. Sorry I'm pretty bad about modulus and I think I definitely missed some things here. Could anyone shed some light on it? Thanks! The following is my solution:

public static long modulus = Long.MAX_VALUE / 26;
public String longestDupSubstring(String S) {
    int n = S.length();
    int l = 1;
    int r = n - 1;
    int index = -1;
    while (l <= r) {
        int m = l + (r - l) / 2;
        int temp = findDuplicate(S, m);
        if (temp != -1) {
            index = temp;
            l = m + 1;
        }
        else {
            r = m - 1;
        }
    }
    return index == -1 ? "" : S.substring(index, index + r);
}
private int findDuplicate(String s, int len) {
    Set<Long> set = new HashSet<>();
    long hash = 0;
    long p = 1;
    for (int i = 0; i < len; i++) {
        hash = (hash * 26 + s.charAt(i) - 'a') % modulus;
        p = (p * 26) % modulus;
    }
    set.add(hash);
    
    for (int i = len; i < s.length(); i++) {
        hash = (hash * 26 + (s.charAt(i) - 'a')
                - (s.charAt(i - len) - 'a') * p) % modulus;
        if (hash < 0) {
            hash += modulus;
        }
        if (set.contains(hash)) {
            return i - len + 1;
        }
        set.add(hash);
    }
    return -1;
}
1

There are 1 best solutions below

0
Emma On

26 is not part of the modulus, is part of hashing. If we would separate those in the algorithm, then we might see how it'd work. For modulus usually a large number would simply suffice, does not have to be a long:

public final class Solution {
    int a = 26;
    int mod = 1 << 29;

    public final String longestDupSubstring(
        final String s
    ) {
        int lo = 1;
        int hi = s.length() - 1;

        while (lo <= hi) {
            int mid = lo + ((hi - lo) >> 1);
            int startIndex = search(s, mid);

            if (startIndex == - 1) {
                hi = mid - 1;
            }

            else {
                lo = -~mid;
            }
        }

        int startIndex = search(s, hi);
        return startIndex == -1 ? "" : s.substring(startIndex, startIndex + hi);
    }

    public final int search(
        final String s,
        final int len
    ) {
        long h = 0;
        long aL = 1;

        for (int i = 0; i < len; i++) {
            h = (h * a % mod + s.charAt(i)) % mod;
            aL = aL * a % mod;
        }

        HashMap<Long, List<Integer>> visited = new HashMap<>();
        visited.put(h, new ArrayList<Integer>());
        visited.get(h).add(0);

        for (int i = 1; i < -~s.length() - len; i++) {
            h = ((h * a % mod - s.charAt(i - 1) * aL % mod + mod) % mod + s.charAt(i + len - 1)) % mod;

            if (visited.containsKey(h)) {
                for (int start : visited.get(h)) {
                    if (s.substring(start, start + len).equals(s.substring(i, i + len))) {
                        return i;
                    }
                }

            } else {
                visited.put(h, new ArrayList<Integer>());
            }

            visited.get(h).add(i);
        }

        return -1;
    }
}