I am looking for an implementation of a First Common Substring
Mike is not your average guy. I think you are great.
Jim is not your friend. I think you are great.
Being different is not your fault. I think you are great.
Using a Longest Common Substring implementation (and ignoring punctuation), you would get "I think you are great", but I am looking for the first occurring common substring, in this example:
is not your
Perhaps an implementation that generates and ordered list of all common substrings that I can just take the first from.
Edit
The tokens being compared would be complete words. Looking for a greedy match of the first longest sequence of whole words. (Assuming a suffix tree was used in the approach, each node of the tree would be a word)
There are quite a few steps to do this.
Code:
Result: