I have a problem with implementation of output function for Aho-Corasick algorithm. In general I don't quite understand how output function works.
According to this paper in goto function I put appropriate pattern index to output, like output[currentState] = patternIndex;
in fail function I merge existed output for state S with output for failed state S, like output[s] += output[fail[s]]; in search function I use such condition: if (output[state]) != 0) then we find our pattern.
But such approach doesn't work and I've got nonsenses result. Maybe, pseudocode for output function means something else in that paper, I don't know.
Output function with bitwise mapping which I've got here also doesn't work correctly in most cases. Wrong patterns match with this condition if ((output[currentState] & (1 << j)) != 0)
Besides, I don't quite understand why bitwise operations are using for output calculation.
I will be grateful for any help in clarification of implementation of the output function.
EDIT1: seems, that problem was in overflowing after bit shift operation: output[currentState] |= (1 << i);and out[currentState] & (1 << j)
So far, tried to use BigInteger, but seems, that it causes a problem with performance.
EDIT2: tried BitSet, but performance still very poor. Maybe, there is a problem in implementation of algorithm, i don't know.
The 32 bits bitwise operation just like this below
pattern:
he, she, his, hersThe total state number is
9, and2, 5, 7, 9state is final state like below AC automationOutput function represented which a pattern you matched in current state. like :
so use
output[currentState] & (1 << j)to calculate which pattern is matched.This operation just like one-hot encoding. And the operation problem is type of output function. If the output function type is
32bits, and we can just search at most31patterns. Because over the31patterns, it will overflow by shifting.