Fitting bartMachine in Caret getting argument of length zero/Incorrect Number of dimensions

858 Views Asked by At

I am trying to model a good example of bartMachine usage in Caret, and I can't seem to model a bartMachine with Caret correctly, can anyone tell me, what is exactly the main error means? or is there a simple reproducible code for BART Modeling?

Here is the snippets I use to model a bartMachine using some dummy data of HouseVotes84 and cars dataset:

library(mlbench)
library(caret)

data("HouseVotes84")
#Using HouseVotes84 as Classification Task Dataset and mtcars as Regression Task Dataset
dummy_data_classif <- HouseVotes84[,2:length(colnames(HouseVotes84))] %>% 
  mutate_if(is.factor, as.numeric)
dummy_data_classif <- data.frame(cbind(Class=HouseVotes84[,1], dummy_data_classif))
dummy_data_classif[is.na(dummy_data_classif)] <- 0

data("cars")
dummy_data_regr <- cars

caret_method_tester <- function(dummy_data, formula, resample_plan=1, 
                                test_method, time_limit=30, 
                                grid_param=c(), parallel_mode=FALSE){
  library(caret)
  library(R.utils)
  formula <- as.formula(formula)
  resampling <- NULL
  if(resample_plan==1){
    resampling <- trainControl(method = "repeatedcv",
                               number = 10,
                               repeats = 5,
                               allowParallel = parallel_mode) 
  }
  else if(resample_plan==2){
    resampling <- trainControl(method = "cv",
                               number = 5,
                               allowParallel = parallel_mode) 
  }
  else if(resample_plan==3){
    resampling <- trainControl(method = "adaptive_cv",
                               number = 10, repeats = 5,
                               allowParallel = parallel_mode,
                               adaptive = list(min = 3, alpha = 0.05, 
                                               method = "BT", complete = FALSE))
  }
  else if(resample_plan==4){
    resampling <- trainControl(method = "boot",
                               number = 5,
                               allowParallel = parallel_mode)
  }
  else if(resample_plan==5){
    resampling <- trainControl(method = "boot_all",
                               number = 5,
                               allowParallel = parallel_mode)
  }
  tryCatch(
    expr={
      if(length(grid_param) > 0){
        withTimeout(
          model <- caret::train(formula, 
                       data = dummy_data, 
                       method = test_method, 
                       trControl = resampling,
                       tuneGrid=grid_param), timeout = 300
        )
      }
      else{
        withTimeout(
          model <- caret::train(formula, 
                                data = dummy_data, 
                                method = test_method, 
                                trControl = resampling), timeout=300   
        )
        
      }
      return(model)
    },
    error=function(cond){
      message("Test Model Failed")
      message("Here's the original error message:")
      message(cond)
      return(NULL)
    },
    warning=function(cond){
      message("Warning Triggered!")
      message("Here's the original warning message:")
      message(cond)
      return(model)
    }
  )
}

bart_reg <- caret_method_tester(dummy_data_regr, "Price ~ .", 
                test_method="bartMachine", time_limit=30, resample_plan=2)

Test Model Failed
Here's the original error message:
argument is of length zero

bart_classif <- caret_method_tester(dummy_data_classif, "Class ~ .", 
                test_method="bartMachine", time_limit=30, resample_plan=2)

Test Model Failed
Here's the original error message:
incorrect number of dimensions

I used try Catch method to easily notify things about the code progress, so it is clear when the code fails, issues warning, or is successful.

the dataset also doesn't have any NA Values as far as I am concerned

1

There are 1 best solutions below

2
On BEST ANSWER

Would be much better if you reduce the code to the essential part, basically the train function with bartMachine doesn't work. We can illustrate that with this example and we get the same error message:

mdl = train(mpg ~ .,data=mtcars,method="bartMachine",trControl=trainControl(method="cv"))
Error in if (grepl("adaptive", trControl$method) & nrow(tuneGrid) == 1) { : 
  argument is of length zero

The error is a bug with the code in caret, if you don't provide the tuning grid, the default function used to create it does not return a data.frame:

getModelInfo()$bartMachine$grid
function(x, y, len = NULL, search = "grid") {
                    if(search == "grid") {
                      out <- expand.grid(num_trees = 50,
                                         k = (1:len)+ 1,
                                         alpha = seq(.9, .99, length = len),
                                         beta = seq(1, 3, length = len),
                                         nu =  (1:len)+ 1)
                    } else {
                      out <- data.frame(num_trees = sample(10:100, replace = TRUE, size = len),
                                        k = runif(len, min = 0, max = 5),
                                        alpha = runif(len, min = .9, max = 1),
                                        beta = runif(len, min = 0, max = 4),
                                        nu = runif(len, min = 0, max = 5))
                    }
                    if(is.factor(y)) {
                      out$k <- NA
                      out$nu <- NA
                    }
                  }

You can either provide a tune grid:

mdl = train(mpg ~ .,data=mtcars,method="bartMachine",
trControl=trainControl(method="boot"),
tuneGrid=data.frame(num_trees=50,k=3,alpha=0.1,beta=0.1,nu=4))

mdl

Bayesian Additive Regression Trees 

32 samples
10 predictors

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 32, 32, 32, 32, 32, 32, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  2.826126  0.8344417  2.292464

Tuning parameter 'num_trees' was held constant at a value of 50
 'beta' was held constant at a value of 0.1
Tuning parameter 'nu' was
 held constant at a value of 4

Or you fix the function above and create a new method, you can read more here:

newBartMachine = getModelInfo()$bartMachine

newBartMachine$grid = function(x, y, len = NULL, search = "grid") {
                    if(search == "grid") {
                      out <- expand.grid(num_trees = 50,
                                         k = (1:len)+ 1,
                                         alpha = seq(.9, .99, length = len),
                                         beta = seq(1, 3, length = len),
                                         nu =  (1:len)+ 1)
                    } else {
                      out <- data.frame(num_trees = sample(10:100, replace = TRUE, size = len),
                                        k = runif(len, min = 0, max = 5),
                                        alpha = runif(len, min = .9, max = 1),
                                        beta = runif(len, min = 0, max = 4),
                                        nu = runif(len, min = 0, max = 5))
                    }
                    if(is.factor(y)) {
                      out$k <- NA
                      out$nu <- NA
                    }
                    return(out)
                  }
mdl = train(mpg ~ .,data=mtcars,method=newBartMachine,trControl=trainControl(method="cv"),tuneLength=1)

Bayesian Additive Regression Trees 

32 samples
10 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 28, 28, 28, 29, 30, 30, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  2.338429  0.9581958  2.057181

Tuning parameter 'num_trees' was held constant at a value of 50
 'beta' was held constant at a value of 1
Tuning parameter 'nu' was
 held constant at a value of 2