I am running into the below error on my cm.glm line:
Error: data
and reference
should be factors with the same levels.
# Predict using Logistic Regression
pred.glm <- ifelse(predict(fit.glm, irisTest) > 0.5, "setosa", "other")
cm.glm <- confusionMatrix(pred.glm, (irisTest$Species))
acc.glm <- cm.glm$overall['Accuracy']
prec.glm <- cm.glm$byClass['Pos Pred Value']
rec.glm <- cm.glm$byClass['Sensitivity']
# Load libraries
library(MASS)
library(caret)
library(nnet)
# Loading iris dataset
data(iris)
# Convert it into a binary class dataset
iris$Species <- ifelse(iris$Species == "setosa", "setosa", "other")
# Split the dataset
set.seed(123)
trainIndex <- createDataPartition(iris$Species, p = .8,
list = FALSE,
times = 1)
irisTrain <- iris[ trainIndex,]
irisTest <- iris[-trainIndex,]
# Fit Logistic Regression
fit.glm <- multinom(Species ~ ., data = iris)
# Predict using Logistic Regression
pred.glm <- ifelse(predict(fit.glm, irisTest) > 0.5, "setosa", "other")
cm.glm <- confusionMatrix(pred.glm, (irisTest$Species))
acc.glm <- cm.glm$overall['Accuracy']
prec.glm <- cm.glm$byClass['Pos Pred Value']
rec.glm <- cm.glm$byClass['Sensitivity']
You have two issues with your code.
First
predict
per default gives the most likely class, to compare to a cutoff of your link function you have to usetype="probs"
as argument.Second
confusionMatrix
expects the two arguments to be factors with the same levels. Just convert your vectors to factor. In case one of the sets does not have both factor levels (this could happen with other seeds), specify the factor levels explicitly.Your two lines of code should then look like this: