How do I find the maximum phonetic and synctactic similarity between two strings?

73 Views Asked by At

I have two strings (e.x "Sacred Heart" and "Sacred Heart Mountain"). Let's call them s1 and s2. S1 and s2 are obviously phonetically similar, but I have no idea how to prove this algorithmically. I'm using levenshtein distance for my similarity metric, as well as Nysiis and Match Rating Codex (MRC) embeddings for phonetic compression. So far, these are working well, except in these cases where one string is significantly longer than the other but contains some substring which is highly similar. Ideally, this algorithm would be in linear time O(n) or less.

The longer the similarity between the substrings the stronger the score. I tried splitting the two strings into substrings separated by spaces, and then comparing all possible combinations against each other. First of all, this algorithm was way too slow, and due to my evaluation metric, shorter substrings tended to do better against each other. After looking online, there hasn't been a similar problem, so I am unsure if this problem can be solved algorithmically.

0

There are 0 best solutions below