Apologies for this probable repeated question.
I am trying to use rolling hash with Karp Rabin.I have looked a different implementations of rolling hash and I am wondering where I am going wrong. Though the text has the pattern the match using hash does not seem to be occurring at all. Attaching (a part of)code for calculating the hash and the search.
long hash(char* key, int len) {
int j = 0;
unsigned long long h = 0;
for (j = 0; j < len; j++) {
h = h * PRIME_BASE + key[j];
h %= PRIME_MOD;
}
return h;
}
int search(char* pattern, char *txt, int textLength, int patternLength) {
int i, val = 0;
long long txtHash=0;
long power = 1;
for (i = 0; i < patternLength; i++)
power = (power * PRIME_BASE) % PRIME_MOD;
i=0;
printf(" the value of power is %ld ",power);
for (i = 0; i < textLength; i++) {
txtHash = txtHash * PRIME_BASE + txt[i];
txtHash %= PRIME_MOD;
if (i >= patternLength)
{
txtHash -= power * txt[i - patternLength] % PRIME_MOD;
if (txtHash < 0){
//negative can be made positive with mod
txtHash += PRIME_MOD;
}
}
int offset=0;
if(i>=patternLength){
offset=i-patternLength+1;
}
else{
offset=0;
}
if (patHash == txtHash) {
if (check(pattern, txt, offset, patternLength)) {
val++;
}
}
}
if (val > 0) {
return val;
}
// no match
return 0;
}
bool check(char* pattern, char* txt, int k, int M) {
int j = 0;
for (j = 0; j < M; j++) {
if (pattern[j] != txt[k + j]) {
return false;
}
}
return true;
}
I had the problem of the buffer overflow, which I have dealt with but the pattern and text hash do not seem to be matching for a protein sequence text string( with 1000 characters) and 17 characters pattern Any ideas where I could be going wrong ?
Thanks, Bhavya
I spent some more time on this problem and found that because I had initialised the value of long long txtHash to some default value I was facing the situation wherein the hashes were not matching. updating the code above with the fix