I ran a VADER sentiment analysis on mulitple files and the compound score for all of them was 1; how can I validate this result?

51 Views Asked by At

I have transcript files from 6 different interviews, and am running a sentiment analysis on the text using VADER. The compound score for all the files was 1. This does not seem correct to me, but I'm not sure why this happened, or how to trouble shoot.

The code I have is:

  for (i in MD_scripts) {
    file_MD <- read_file(i)
    gsub("[\r\n]", "", file_MD)
    vader_MD <- get_vader(file_MD)
    df_vader <- data.frame(rbind(df_vader, vader_MD))
   } 

The pos, neu, neg scores are also eerily similar, but not exactly the same. Any tips/ideas?

I thought of running VADER on individual sentences (successful in doing this) and trying to calculate the overall compound score by hand, but I could not figure out how to do that.

1

There are 1 best solutions below

0
Rui Barradas On

Here are two ways of correcting the code in the question.

One, use a for loop. You will have to create a results list vader_list beforehand.

library(vader)

vader_list <- vector("list", length = length(MD_scripts))
for (i in seq_along(MD_scripts)) {
  file_MD <- MD_scripts[[i]] |>
    readLines() |> 
    paste(collapse = " ")
  vader_list[[i]] <- get_vader(file_MD)
} 

You can also use a lapply loop, which makes the code simpler.

library(vader)

vader_list <- lapply(MD_scripts, \(fl) {
  fl |>
    readLines() |> 
    paste(collapse = " ") |>
    get_vader()
})