I am looking to create a co-occurrence matrix with bigrams in stead of unigrams from a single string. I am referring the following links
http://text2vec.org/glove.html
https://tm4ss.github.io/docs/Tutorial_5_Co-occurrence.html#3_statistical_significance
I want to create the matrix and traverse it to create dataset as follows
Trem1 Term2 Score
The biggest catch being traversing the sentence with bigrams. Any help on this would be great
Just specify your bigrams and create the co-occurence matrices. Below are some (really) simple examples. Choose 1 package and do everything with that one. Both quanteda and text2vec can use multiple cores / threads. Traversing over the resulting co-occurence matrices can be done with reshape2::melt, like this
reshape2::melt(as.matrix(my_cooccurence_matrix))
.using quanteda to create a feature co-occurrence matrix:
using text2vec to create a feature co-occurrence matrix: