SVM performance not consistent with AUC score

114 Views Asked by A Jorg At 12 February 2022 at 15:35

I have a dataset that contains information about patients. It includes several variables and their clinical status (0 if they are healthy, 1 if they are sick). I have tried to implement an SVM model to predict patient status based on these variables.

library(e1071)

Index <- 
  order(Ytrain, decreasing = FALSE)

SVMfit_Var <- 
  svm(Xtrain[Index, ], Ytrain[Index],
      type = "C-classification", gamma = 0.005, probability = TRUE, cost = 0.001, epsilon = 0.1)


preds1 <- 
  predict(SVMfit_Var, Xtest, probability = TRUE)
preds1 <- 
  attr(preds1, "probabilities")[,1]

samples <- !is.na(Ytest)
  pred <- prediction(preds1[samples],Ytest[samples])
  AUC<-performance(pred,"auc")@y.values[[1]]


prediction <- predict(SVMfit_Var, Xtest)
xtab <- table(Ytest, prediction)

To test the performance of the model, I have calculated the ROC AUC, and with the validation set I obtain an AUC = 0.997. But when I view the predictions, all the patients have been assigned as healthy.

AUC = 0.997
> xtab
     prediction
Ytest  0  1
    0 72  0
    1 52  0

Can anyone help me with this problem?

Original Q&A

There are 1 best solutions below

Kat On 13 February 2022 at 03:32 BEST ANSWER

Did you look at the probabilities versus the fitted values? You can read about how probability works with SVM here.

If you want to look at the performance you can use the library DescTools and the function Conf or with the library caret and the function confusionMatrix. (They provide the same output.)

library(DescTools)
library(caret)

# for the training performance with DescTools
Conf(table(SVMfit_Var$fitted, Ytrain[Index])) 
       # svm.model$fitted, y-values for training

# training performance with caret
confusionMatrix(SVMfit_Var$fitted, as.factor(Ytrain[Index])) 
             # svm.model$fitted, y-values 
                       # if y.values aren't factors, use as.factor()

# for testing performance with DescTools
    # with `table()` in your question, you must flip the order:
         # predicted first, then actual values
Conf(table(prediction, Ytest))

# and for caret
confusionMatrix(prediction, as.factor(Ytest))

Your question isn't reproducible, so I went through this with iris data. The probability was the same for every observation. I included this, so you can see this with another data set.

library(e1071)
library(ROCR)
library(caret)

data("iris")

# make it binary
df1 <- iris %>% filter(Species != "setosa") %>% droplevels()
# check the subset
summary(df1)

set.seed(395) # keep the sample repeatable
tr <- sample(1:nrow(df1), size = 70, # 70%
             replace = F)

# create the model
svm.fit <- svm(df1[tr, -5], df1[tr, ]$Species,
               type = "C-classification",
               gamma = .005, probability = T,
               cost = .001, epsilon = .1)

# look at probabilities
pb.fit <- predict(svm.fit, df1[-tr, -5], probability = T) 
            # this shows EVERY row has the same outcome probability distro
pb.fit <- attr(pb.fit, "probabilities")[,1]

# look at performance 
performance(prediction(pb.fit, df1[-tr, ]$Species), "auc")@y.values[[1]]
# [1] 0.03555556  that's abysmal!! 

# test the model
p.fit = predict(svm.fit, df1[-tr, -5])
confusionMatrix(p.fit, df1[-tr, ]$Species)
# 93% accuracy with NIR at 50%... the AUC score was not useful

# check the trained model performance
confusionMatrix(svm.fit$fitted, df1[tr, ]$Species)
# 87%, with NIR at 50%... that's really good

SVM performance not consistent with AUC score

There are 1 best solutions below

Related Questions in R

Related Questions in SVM

Related Questions in PREDICTION

Related Questions in AUC

Related Questions in E1071

Trending Questions

Popular # Hahtags

Popular Questions