I have a gold jewelry dataset and I am trying to predict which Item will a customer buy using a multinomial logistic regression. Item has 4 categories: necklace, bracelet, earrings, ring.
Here's my code:
# Load the dataset
data <- read.csv("C:/Users/user/Desktop/LamehFinalDataProject.csv")
# Load the necessary libraries
library(nnet) # for multinom function
library(broom) # for tidy function
library(knitr) # for kable function
library(dplyr) # for data manipulation
# Build the multinomial logistic regression model
my.model <- multinom(Item ~ Age + Gender + Day + Time + Month + Manufacturer + Previous.Purchases + Special.Occasion + Price, data = data)
# Tidy the model and calculate odds ratios
tidy_result <- tidy(my.model, exponentiate = FALSE, conf.int = TRUE)
tidy_result_with_odds_ratios <- tidy_result %>%
mutate(odds_ratio = exp(estimate))
kable(tidy_result_with_odds_ratios, digits = 3, format = "markdown")
# Create a train-validation split
set.seed(123) # for reproducibility
split = sample(1:2, nrow(data), replace = TRUE, prob=c(0.7, 0.3))
train = data[split == 1, ]
valid = data[split == 2, ]
# Fit the model on the training set
my.model <- multinom(Item ~ Age + Gender + Day + Time + Month + Manufacturer + Previous.Purchases + Special.Occasion + Price, data = train)
# Make predictions on the validation set
predictions <- predict(my.model, newdata = valid, type = "class")
predictions <- as.factor(predictions)
# Load the caret package for performance metrics
library(caret)
# Calculate the confusion matrix
conf_matrix <- confusionMatrix(predictions, valid$Item)
print(conf_matrix)
The code is generating an error in the last 2 lines of the code, when I run the confusion matrix.
Here's what I'm getting:
Error in confusionMatrix.default(predictions, as.factor(valid$Item)): The data must contain some levels that overlap the reference.
When I try to check the levels here's what I'm getting:
> levels(valid$Item)
character(0)
> levels(predictions)
character(0)
I'm also checking the class of both:
> class(valid$Item)
[1] "factor"
> class(predictions)
[1] "factor"