I'm working on simulated data and I have some problems. I'm trying to fix parameters.
library(e1071)
library(ROCR)
set.seed(10)
#function to generate data
generate.data <- function(n){
x2 <- runif(n)
x1 <- runif(n)
y <- as.factor(ifelse((x2>2*x1)|(x2>(2-2*x1)),-1,1))
return(data.frame(x1,x2,y))
}
#Training and test: n = 500
dtrain <- generate.data(500)
dtest <- generate.data(200)
I performed a cross validation on the training set and I had with the radial kernel, a parameter cost=1000
and gamma=0.1
.
tune.out = tune(svm, y~x1+x2, data=dtrain, kernel="radial",
ranges=list(cost=c(0.1,1,10,100,1000), gamma=c(0.01,0.1,1,10,100)))
svmbestmod = svm(y~x1+x2, data=dtrain, kernel="radial", cost=1000, gamma=0.1,
probability=TRUE)
I wanted to predict on my test set but I have 0 error. I don't understand.
yrad.test <- predict(svmbestmod, dtest)
#confusion matrix
mc.rad <- table(dtest$y, yrad.test)
print(mc.rad)
#Error
err.rad <- 1-sum(diag(mc.rad))/sum(mc.rad)
print(err.rad)
If someone could help me understand my errors or what's wrong, it would be nice.
I've put 20000 points in the test set
It seems to me that your data is completely separable, basically your data is too good to be true and your model is able to make perfect predictions, maybe you can add some noise to the formula that generates it.
Also if your test data contains only 200 entries is quite possible that none of them is close enough to the decision boundaries to be misclassified, as I mentioned I had to generate a test set of 20000 points to get the about 200 misclassified points you see in the picture.