Algorithm to calculate how much of text A is in text B?

215 Views Asked by At

I need to calculate how much of a block of text (A) is in another block of text (B). Simple algorithms like soundex aren't providing great results for me as text B has additional text within it that isn't/shouldn't be in text A, which throws my figures off. I need to ensure a certain percentage of A is within B, and ignore the additions to B.

My first thought for a simple algorithm that might work well in my case would be to split A into sentences, note the total number of sentences, then search B for an instance of each sentence to provide a percentage. While this should work it feels quite hacky, and I'm sure someone more intelligent than I has devised an algorithm to provide a better calculation on a similar principle.

1

There are 1 best solutions below

0
On

Longest Common Subsequence looks like best suited for your purposes.