How to speed up a CYK Algorithm in C++?

1.5k Views Asked by At

I would like to implement CYK algorithm in C/C++, but available on various websites pseudo-code doesn't answer how to implement it efficiently. I wrote a version that uses some stl structures like map and sets, but it's very slow. I was thinking about improve my implementation by using only binary operations, but I don't know how to store my table with sets. Lets say that we have only 8 symbols for non terminals and 26 for terminals. I was thinking about using table of unsigned chars (2^8 -> 8 positions for 0-1) for storing information about productions, but I don't know how to store it.

Could you give me some help or clue?

1

There are 1 best solutions below

0
On

You don't provide many details, a simple implementation (even pseudo code) could help a lot to give you hints.

From wikipedia:

let the input be a string S consisting of n characters: a1 ... an. let

for this you can use a simple string, or a vector of chars

the grammar contain r nonterminal symbols R1 ... Rr.

I would store the nonterminal symbols in an array of bools std::array nonterminal {}; then since yu have characters you can initialize the position corresponding to the char, with true.

nonterminal[static_cast('C')] = true; you do the same with the terminal and you have a really fast look up mechanism.

This grammar contains the subset Rs which is the set of start symbols. let P[n,n,r] be an array of booleans. Initialize all elements of P to false. for each i = 1 to n for each unit production Rj -> ai set P[i,1,j] = true for each i = 2 to n -- Length of span for each j = 1 to n-i+1 -- Start of span for each k = 1 to i-1 -- Partition of span for each production RA -> RB RC if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then S is member of language else S is not member of language

The algorithm seems pretty straightforward after that. Just make sure not to initialize temporary values inside tight loops and you'll be fine.