predicted values for polr generate only a small subset of response values

279 Views Asked by At

I want to create a model for some ordinal categorical response data with values from 0 to 10, having 3 predictors being categorical and numeric. I am therefore using, among others, the function MASS::polr. Here a dummy example:

data <- data.frame(response = factor(sample.int(11, size = 300, replace = TRUE), 
                                     levels = c("1", "2", "3", "4", "5", 
                                                "6", "7", "8", "9", "10", "11"), 
                                     ordered = TRUE),
                   gender = rep(0:1, 300),
                   pred2 = sample.int(11, size = 300, replace = TRUE),
                   age = rpois(300, 30))

The problem is that when I use the predict function with type "class" and I compare the predicted data to the real outcomes, the model seems to take into account only some of the values from the response variable, in particular, the most frequent ones from the training:

index <- createDataPartition(data$response, p = 0.7, list = FALSE)

dummy_train <- data[index, ]
dummy_test <- data[-index, ]

> table(dummy_train$response)
  1  2  3  4  5  6  7  8  9 10 11 
 37 31 48 21 66 37 31 35 45 30 42
model_polr <- polr(response ~ gender + pred2 + age, data = dummy_train, Hess = TRUE)
predict_polr <- predict(model_polr, newdata = dummy_test, type = "class")

> summary(predict_polr)
  1   2   3   4   5   6   7   8   9  10  11 
  0   0   0   0 177   0   0   0   0   0   0 

I am a bit lost because I see that these are the outcomes with the highest probability, but I don't see any utility in this type of prediction. Am I missing something in the way of setting the predicted data?

0

There are 0 best solutions below