R package E1071 unrelated column causes crash of SVM

18 Views Asked by At

In the R code below, I would have expected the junk column to have no effect on the SVM calculations, as the formula "true~aaaa+bbbb+cccc" clearly excludes it.

However, deleting the statement "data$junk <- NULL" causes the SVM calculation too crash. The code as it is listed below runs fine. It returns: Parameter tuning of ‘e1071::svm’:

- sampling method: 5-fold cross validation 

- best parameters:
 gamma cost
   0.5    4

- best performance: 0.17 

Here is the code:

options( error = function() {
        traceback( 2 )
        options( error = NULL )
        stop( "exiting after script error" )
})

data <- data.frame(
  true = as.factor( c( "a", "b", "b", "a", "a", "b", "a", "b", "b", "b", "a", "a", "b", "a", "a", "b", "b", "b", "a", "a", "a", "a" ) ),
  aaaa = c( 2, 1, 0, 0, 3, 0, 0, 1, 0, 0, 5, 1, 2, 0, 7, 2, 1, 0, 1, 2, 3, 4 ),
  bbbb = c( 4, 0, 0, 0, 4, 1, 4, 1, 0, 0, 6, 1, 1, 4, 0, 2, 0, 2, 1, 2, 3, 4 ),
  cccc = c( 3, 2, 0, 0, 3, 1, 4, 1, 2, 0, 0, 7, 1, 3, 5, 2, 2, 1, 1, 2, 3, 4 ),
  junk = c(NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
)

data$junk <- NULL # Leave this in or take this statment out.

print( str( data ) )

e1071::tune( e1071::svm,
  true~aaaa+bbbb+cccc,
  data = data,
  type = 'C-classification',
  scale = TRUE,
  ranges = list( gamma = 2^(-1:1), cost = 2^(2:4) ),
  tunecontrol = e1071::tune.control( sampling = 'cross', cross = 5 )
)

Version information

$ Rscript --version
Rscript (R) version 4.3.2 (2023-10-31)
$ R
installed.packages()
e1071                "4.3.2"
0

There are 0 best solutions below