Known algorithms for incremental longest common substring?

78 Views Asked by At

Longest common substring is a well-known problem. I'm interested in identifying common substrings in a set of strings that is growing over time.

Use case: suppose I'm emitting log messages inside a tight loop. Many logging libraries offer a flag for suppressing duplicate messages, but what about similar messages? Contrived example:

for file in files {
    if ! process_file(file) {
        warn!("Failed to process {}.", file);
    }
}

If files has, say O(1) elements, this is all well & good. If it has O(10^6) elements, I'd like to see something like this in the log files:

Failed to process abc000000.txt.
Failed to process abc000001.txt.
Failed to process abc000002.txt.
[Repeated 912345 times]

in my log file.

I'm picturing a logging implementation that keeps track of longest common substring between the current & last log messages. If it spikes, it begins tracking all subsequent log messages. If more than n consecutive log messages display similar longest common substrings, it stops emitting them & starts a count.

Any prior art, here?

0

There are 0 best solutions below