GloVe implementation: I am trying to get the cooccurence values for word pairs by reading the binary cooccurence.bin file using python. This file gets produced in the third step, as a result of running the coocur program.
Has anyone tried this? It seems like there are three values for each pair plus an index:
typedef struct cooccur_rec_id { int word1; int word2; real val; int id;}
When writing, though, I see three values
- index of the first word (integer - 4 bytes)
- index of the second word (integer - 4 bytes)
- cooccurence (real - 4 bytes? 8 bytes?)
This is what I inferred by looking at the cooccur program. I can't seem to get the right number of bytes to read. It looks like reading by 16 bytes at a time, I can get the integers correctly but the coocurance value doesn't make sense.
Anyone tried this? Any help would be appreciated