We are working on a survey where we have a few open-ended answers part from the numeric/categorical responses.Till now we used to categorize these texts into 10-15 buckets manually so that the marketing team can take actions on it.For example, if the respondent is asked what other features he wants in a particular tablet which he is using, we will group his/hers responses into buckets like 'Better security features', 'Better support' etc.
Instead of doing it manually I am automating this by creating individual logistic regression/CART/Random Forest Equations for each bucket. For example for bucket one 1 use the code
model1=glm(Better.support~.,data=verbatimSparse,family=binomial)
verbatim$predict1=predict(model1,type="response")
I am building 12 other models like this and each response will be grouped into the bucket where the predicted probability is the highest.This is somewhat serving my purpose, but with the accuracy is only around 80%.Is there any other method to better classify the text.