Problem ROC curves SVM simulated data

Question

Problem ROC curves SVM simulated data

124 Views Asked by Mohamed At 12 August 2025 at 19:00

I'm working on simulated data and I have some problems. I'm trying to fix parameters.

library(e1071)  
library(ROCR)  
set.seed(10)  

#function to generate data  
generate.data <- function(n){  
 x2 <- runif(n)  
 x1 <- runif(n)  
 y <- as.factor(ifelse((x2>2*x1)|(x2>(2-2*x1)),-1,1))  
 return(data.frame(x1,x2,y))  
}  

#Training and test: n = 500  
dtrain <- generate.data(500)  
dtest <- generate.data(200)

I performed a cross validation on the training set and I had with the radial kernel, a parameter cost=1000 and gamma=0.1.

tune.out = tune(svm, y~x1+x2, data=dtrain, kernel="radial",
                ranges=list(cost=c(0.1,1,10,100,1000), gamma=c(0.01,0.1,1,10,100)))  
svmbestmod = svm(y~x1+x2, data=dtrain, kernel="radial", cost=1000, gamma=0.1,
                 probability=TRUE)

I wanted to predict on my test set but I have 0 error. I don't understand.

yrad.test <- predict(svmbestmod, dtest)  

#confusion matrix  
mc.rad <- table(dtest$y, yrad.test)  
print(mc.rad)  

#Error 
err.rad <- 1-sum(diag(mc.rad))/sum(mc.rad)  
print(err.rad)

If someone could help me understand my errors or what's wrong, it would be nice.

Original Q&A

There are 1 best solutions below

**zenagian** · Accepted Answer

I've put 20000 points in the test set

# First I isolate any misclassified points in the test set
library(dplyr)
errors <- cbind(dtest,yrad.test) %>% dplyr::filter(y != yrad.test)

# Then I plot all the points in the train set, 
# coloured based on thier respective class,
# while misclassified entries in the test set are shown in black

library(ggplot2)
p <- ggplot2::ggplot(data = dtrain, aes(x1,x2)) +
 geom_point(aes(colour = factor(y)) )+ 
 geom_point(data = errors,colour = "black")`

It seems to me that your data is completely separable, basically your data is too good to be true and your model is able to make perfect predictions, maybe you can add some noise to the formula that generates it.

Also if your test data contains only 200 entries is quite possible that none of them is close enough to the decision boundaries to be misclassified, as I mentioned I had to generate a test set of 20000 points to get the about 200 misclassified points you see in the picture.

Problem ROC curves SVM simulated data

There are 1 best solutions below

Related Questions in R

Related Questions in CROSS-VALIDATION

Related Questions in SVM

Related Questions in DATASET

Related Questions in SIMULATION

Trending Questions

Popular # Hahtags

Popular Questions