I am new to machine learning. I am classifying tweets in three classes using term frequency feature. My training and test data are balanced for all classes and it's ratio is 70-30% for all classes. But accuracy for all three classes is so different like class one high.. 92% accurate, class two medium.. 53% accurate and class three very low.. 15% accurate. Can anyone please tell what can possibly be wrong with my algorithm? I have three classes information neutral and metaphor.
`# Create term document matrix
tweets.information.matrix <- t(TermDocumentMatrix(tweets.information.corpus,control = list(wordLengths=c(4,Inf))));
tweets.metaphor.matrix <- t(TermDocumentMatrix(tweets.metaphor.corpus,control = list(wordLengths=c(4,Inf))));
tweets.neutral.matrix <- t(TermDocumentMatrix(tweets.neutral.corpus,control = list(wordLengths=c(4,Inf))));
tweets.test.matrix <- t(TermDocumentMatrix(tweets.test.corpus,control = list(wordLengths=c(4,Inf))));`
and that is how it calculates the probability
`probabilityMatrix <-function(docMatrix)
{
# Sum up the term frequencies
termSums<-cbind(colnames(as.matrix(docMatrix)),as.numeric(colSums(as.matrix(docMatrix))))
# Add one
termSums<-cbind(termSums,as.numeric(termSums[,2])+1)
# Calculate the probabilties
termSums<-cbind(termSums,(as.numeric(termSums[,3])/sum(as.numeric(termSums[,3]))))
# Calculate the natural log of the probabilities
termSums<-cbind(termSums,log(as.numeric(termSums[,4])))
# Add pretty names to the columns
colnames(termSums)<-c("term","count","additive","probability","lnProbability")
termSums
}
`
results for information are highly accurate , for neutral moderately accurate nad for Metaphor results are very low