How does "sentimentr" package split a paragraph or sentences into more than 1 sentences?

303 Views Asked by At

I am trying to run sentiment analysis in r using "sentimentr" package. I fed in a list of comments and in the output got element_id, sentence_id, word_count, sentiment. Comments with long phrases are getting converted into single sentences. I want to know the logic based on which package does that ?

I have 4 main categories for my comments- Food, Atmosphere, Price and service. and I have also set bigrams for those themes, i am trying to split sentences based on themes

install.packages("sentimentr")
library(sentimentr)

data <- read.csv("Comments.csv")

data_new <- as.matrix(data)
scores <- sentiment(data_new)
#scores

write.csv(scores,"results.csv")

For e.g - " We had a large party of about 25, so some issues were understandable. But the servers seemed totally overwhelmed. There are so many issues I cannot even begin to explain. Simply stated food took over an hour to be served, it was overcooked when it arrived, my son had a steak that was charred, manager came to table said they were now out of steak, I could go on and on. We were very disappointed" got split up into 5 sentences

1) We had a large party of about 25, so some issues were understandable 2) But the servers seemed totally overwhelmed. 3) There are so many issues I cannot even begin to explain. 4) Simply stated food took over an hour to be served, it was overcooked when it arrived, my son had a steak that was charred, manager came to table said they were now out of steak, I could go on and on. 5) We were very disappointed

I want to know if there is any semantic logic behind the splitting or it's just based on full stops?

1

There are 1 best solutions below

2
On

It uses textshape::split_sentence(), see https://github.com/trinker/sentimentr/blob/e70f218602b7ba0a3f9226fb0781e9dae28ae3bf/R/get_sentences.R#L32

A bit of searching found the logic is here:

https://github.com/trinker/textshape/blob/13308ed9eb1c31709294e0c2cbdb22cc2cac93ac/R/split_sentence.R#L148

I.e. yes it is splitting on ?.!, but then it is using a bunch of regexes to look for exceptions, such as "No.7" and "Philip K. Dick".