Rabin-Karp method for string matching

219 Views Asked by At

Using the Rabin-Karp method for string matching, how many character-character matches does the method use? Where:

source  = abcabcabcacd
pattern = bcac
1

There are 1 best solutions below

0
On

Let's start from the algorithm itself. How does Rabin-Karp methods work. I will use C# code to illustration. Having

  string target = "bcac";
  string source = "abcabcabcacd";

we compute hash for the target:

  int targetHash = RabinHash(target);

and then efficiently compute hashes for all substring of target length (here of substrings of length 4): "abca", "bcab", ... "cacd". If targetHash equals to substring's hash we compare this substring and target corresponding letters. For instance:

if targetHash = 888 and for substrings we have, say,

  abca : 555 
  bcab : 345 
  cabc : 888 <- check this (failure due to hash collision)
  abca : 555 
  bcab : 345 
  cabc : 888 <- check this (failure due to hash collision) 
  abca : 555 
  bcac : 888 <- check this (success)
  cacd : 900  
 

Here we have 3 episodes of character-character matching. Now we are ready to answer your question: the number of character-character matching depends on hash function.

Worst case: we have nothing but hash collisions:

  int RabinHash(string value) => 1;

All 9 substrings should be character-character checked.

Best case: we have no hash collisions; say, we can implement a typical Rabin hash function as

  int RabinHash(string value) => value.Aggregate(0, (s, a) => s * 26 + a - 'a');

And we should check the only 1 actual matching:

  string target = "bcac";
  string source = "abcabcabcacd";

  // int RabinHash(string value) => 1;
  int RabinHash(string value) => value.Aggregate(0, (s, a) => s * 26 + a - 'a');

  int targetHash = RabinHash(target);

  var report = Enumerable
    .Range(0, source.Length - target.Length + 1)
    .Select(i => source.Substring(i, target.Length))
    .Select(chunk => new { 
      chunk,
      hash = RabinHash(chunk)
    })
    .Select(item => $"{item.chunk} : {item.hash,5} {(item.hash == targetHash ? "<- check it" : "")}");

  Console.Write(report);

Outcome:

  abca :   728 
  bcab : 18929 
  cabc : 35180 
  abca :   728 
  bcab : 18929 
  cabc : 35180 
  abca :   728 
  bcac : 18930 <- check it
  cacd : 35207

Final answer: Depending on hash function, we have to check from 1 up to 9 times.