R text2vec; rsparse::GloVe$new() GlobalVectors$new() Env Set/Not Set

436 Views Asked by At

Problem: R GloVe environment using library(text2vec). Set environment with code execution of rsparse::GloVe$new(), BUT, not set with code execution of GlobalVectors$new().

Then ran wv_main = glove$fit_transform(tcm...), error:

Error at glove$fit_transform(tcm...) tcm is a valid dgTMaticx, S4 data type, dim of (545 X 545)

wv_main = glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.01, n_threads = 8)

Error in cpp_glove_create(glove_params) : Not compatible with requested type: [type=S4; target=double].

Seeking help on glove$fit_transform(tcm) for non compatible request type.

tokens = space_tokenizer(df_sample)
token_iter = itoken(tokens, progressbar = FALSE)
vocab = create_vocabulary(token_iter)
vocab = prune_vocabulary(vocab, term_count_min = 5L)
vectorizer = vocab_vectorizer(vocab)
tcm = create_tcm(token_iter, vectorizer, skip_grams_window = 5L)
glove = GlobalVectors$new(word_vectors_size = 50, x_max = 10)
glove <- rsparse::GloVe$new(tcm, rank = 50, x_max = 10, learning_rate = .25)
wv_main = glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.01, n_threads = 8)

dput(glove)
<environment>
1

There are 1 best solutions below

0
On

The input term co-occurence dgTMatrix matrix for the fit_transform() is correct. However, the GloVe matrix factorization model GloVe$new() that builds the model only requires parameters for desired dimension, maximum number of co-occurrences, learning rate for SGD, alpha, lambda, and shuffle. Therefor, GloVe$new(rank = 50, x_max = 10), without the dgTMatrix matrix (tcm) properly creates the GloVe model.