Twitter Sentimental Analysis with twitteR, all scores are zero?

123 Views Asked by At

I'm new to Twitter Sentimental Analysis with twitteR, and used the positive.txt and negative.txt from Hu and Liu. I was so glad that everything ran smoothly but the scores for over 1000 tweets all turned out to be neutral (score = 0)? I can't figure out what went wrong, any help is greatly appreciated!

    setup_twitter_oauth(consumer_key, consumer_secret, token, token_secret)

    #Get tweets about "House of Cards", due to the limitation, we'll set n=1500
    netflix.tweets<- searchTwitter("#HouseofCards",n=1500)
    tweet=netflix.tweets[[1]]
    tweet$getScreenName()
    tweet$getText()
    netflix.text=laply(netflix.tweets,function(t)t$getText())
    head(netflix.text) 
    write(netflix.text, "HouseofCards_Tweets.txt", ncolumns = 1)

    #loaded the positive and negative.txt from Hu and Liu
    positive <- scan("/users/xxx/desktop/positive_words.txt", what = character(), comment.char = ";")
    negative <- scan("/users/xxx/desktop/negative_words.txt", what = character(), comment.char = ";")

    #add positive words 
    pos.words =c(positive,"miss","Congratulations","approve","watching","enlightening","killing","solid")

    scoredsentiment <- function(hoc.vec, pos.word, neagtive)
    {
        clean <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "",hoc.vec)
        clean <- gsub("^\\s+|\\s+$", "", clean) 
        clean <- gsub("[[:punct:]]", "", clean)
        clean <- gsub("[^[:graph:]]", "", clean)
        clean <- gsub("[[:cntrl:]]", "", clean)
        clean <- gsub("@\\w+", "", clean)
        clean <- gsub("\\d+", "", clean) 
        clean <- tolower(clean)

        hoc.list <- strsplit(clean, "") 
        hoc=unlist(hoc.list)

        pos.matches = match(hoc, pos.words)
        scoredpositive <- sapply(hoc.list, function(x) sum(!is.na(match(pos.matches, positive))))  
        scorednegative <- sapply(hoc.list, function(x) sum(!is.na(match(x, negative))))
        hoc.df <- data.frame(score = scoredpositive - scorednegative, message = hoc.vec, stringsAsFactors = F)
        return (hoc.df)
    }

    twitter_scores <- scoredsentiment(netflix.text, scoredpositive, scorednegative)
    print(twitter_scores)
    write.csv(twitter_scores, file=paste('twitter_scores.csv'), row.names=TRUE)

    #draw a graph to show the final outcome
    hist(twitter_scores$score)
    qplot(twitter_scores$score)

Everything works, but the score for each tweet is the same (score =0)

2

There are 2 best solutions below

2
On

From your code, I don't think that the simple match will work. You need to use some form of fuzzy matching scheme. With match, you need the exact word repeated which will not happen a lot and further, you are matching a single word to a string of words.

1
On

You can use Microsoft Cognitive Services for calculation of the Sentiments Score. Microsoft Cognitive Services (Text Analytics API) API can detect sentiment, key phrases, topics, and language from your text.

Refer this link to use Microsoft Cognitive Services in R link

For Sentimental Analysis in R