I am looking for an implementation of a First Common Substring
Mike is not your average guy. I think you are great.
Jim is not your friend. I think you are great.
Being different is not your fault. I think you are great.
Using a Longest Common Substring implementation (and ignoring punctuation), you would get "I think you are great", but I am looking for the first occurring common substring, in this example:
is not your
Perhaps an implementation that generates and ordered list of all common substrings that I can just take the first from.
Edit
The tokens being compared would be complete words. Looking for a greedy match of the first longest sequence of whole words. (Assuming a suffix tree was used in the approach, each node of the tree would be a word)
Here's a little something that will do what you want. You would actually adjust to pre-build your list of strings, pass that in and it will find for you... in this example, the phrase will be based of the string with the shortest string as a baseline.