How to use a custom NRC-style lexicon on Syuzhet for R?

182 Views Asked by At

I am new to R and new to working with Syuzhet.

I am trying to make a custom NRC-style library to use with the Syuzhet package in order to categorize words. Unfortunately, although this functionality now exists within Syuzhet, it doesnt seem to recognize my custom lexicon. Please excuse my weird variable names and the extra libraries, I plan to use them for other stuff later on and I am just testing things.

library(sentimentr)
library(pdftools)
library(tm)
library(readxl)
library(syuzhet)
library(tidytext)

texto <- "I am so love hate beautiful ugly"

text_cust <- get_tokens(texto)


custom_lexicon <- data.frame(lang = c("eng","eng","eng","eng"), word = c("love", "hate", "beautiful", "ugly"), sentiment = c("positive","positive","positive","positive"), value = c("1","1","1","1"))


my_custom_values <- get_nrc_sentiment(text_cust, lexicon = custom_lexicon)                             

I get the following error:

my_custom_values <- get_nrc_sentiment(text_cust, lexicon = custom_lexicon)
New names: • value -> value...4value -> value...5 Error in FUN(X[[i]], ...) : custom lexicon must have a 'word', a 'sentiment' and a 'value' column

As far as I can tell, my data frame exactly matches that of the standard NRC library, containing columns labeled 'word', 'sentiment', and 'value'. So I'm not sure why I am getting this error.

1

There are 1 best solutions below

1
On

The cran version of syuzhet's get_nrc_sentiment doesn't accept a lexicon. The get_sentiment does. But your custom_lexicon has an error. The values need to be integer values, not a character value. And to use your own lexicon, you need to set the method to "custom" otherwise the custom lexicon will be ignored. The code below works just with syuzhet.

library(syuzhet)

texto <- "I am so love hate beautiful ugly"

text_cust <- get_tokens(texto)
custom_lexicon <- data.frame(lang = c("eng","eng","eng","eng"), 
                             word = c("love", "hate", "beautiful", "ugly"), 
                             sentiment = c("positive","positive","positive","positive"), 
                             value = c(1,1,1,1))
get_sentiment(text_cust, method = "custom", lexicon = custom_lexicon)    

[1] 0 0 0 1 1 1 1