Is it possible to use Knuth-Morris-Pratt Algorithm for string matching on text to text?

904 Views Asked by At

I have a KMP code in PHP which is can do string matching between word to text. I wonder if i can use KMP Algorithm for string matching between text to text. Is it possible or not? and how can i use it for finding the matching of the string between 2 text.

Here's the core of KMP algorithm :

<?php
    class KMP{
      function KMPSearch($p,$t){
        $result = array();
        $pattern = str_split($p); 
        $text    = str_split($t);
        $prefix = $this->preKMP($pattern);
    // print_r($prefix);

     // KMP String Matching
     $i = $j = 0;
        $num=0;
        while($j<count($text)){
          while($i>-1 && $pattern[$i]!=$text[$j]){
         // if it doesn't match, then uses then look at the prefix table
            $i = $prefix[$i];
          }
          $i++;
          $j++;
      if($i>=count($pattern)){
         // if its match, find the matches string potition
      // Then use prefix table to swipe to the right.
            $result[$num++]=$j-count($pattern);
            $i = $prefix[$i];
          }
        }
     return $result;
      }

      // Making Prefix table with preKMP function
      function preKMP($pattern){
        $i = 0;
        $j = $prefix[0] = -1;
        while($i<count($pattern)){
          while($j>-1 && $pattern[$i]!=$pattern[$j]){
            $j = $prefix[$j];
          }
          $i++;
          $j++;
          if(isset($pattern[$i])==isset($pattern[$j])){
            $prefix[$i]=$prefix[$j];
          }else{
            $prefix[$i]=$j;
          }
        }
        return $prefix;
      }
    }
    ?>

I calling this class to my index.php if i want to use to find word on the text.

This is the step that i want my code do : (1). I input a text 1 (2). I input a text 2 (3). I want a text 1 become a pattern (every single word is in text 1 treat as pattern) (4). I want my code can find every pattern on text 1 in text 2 (5). Last, my code can show me what the percentage of similarity.

Hope you all can help me or teach me. I've been serching for the answer everywhere but can't find it yet. At least you can teach me.

1

There are 1 best solutions below

5
On

If you just need to find all words that are present in both texts, you don't any string search algorithm to do it. You can just add all words from the first text to a hash table, iterate over the second text and add the words that are in a hash table to the output list.

You can use a trie instead of a hash table if you want a linear time complexity in the worst case, but I'd get started with a hash table because it's easy to use and is likely to be good enough for practical purposes.