Dirichlet Regression using Caret package

178 Views Asked by At

I am attempting to predict tree species composition using Sentinel 2A imagery and forest plot data. I have calculated the proportion of basal area (the cross-sectional area trees of a given species occupy within the plot divided by the total cross-sectional area all tree occupy with in a plot) from the forest plot data and want to predict proportion of basal area by species across the landscape as a raster using the intensity values from Sentinel 2A. Dirichlet regression seems appropriate here, because I have more than 2 categories of proportional data bounded on (0,1) where the summed proportions for each observational unit must be equal to 1. Therefore, I want to model the joint composition of proportional basal area as a function of spectral reflectance intensity using Dirichlet regression, with k-folds crossvalidation using 10 folds and 5 repeats. This seems like a modeling exercise perfectly suited to caret::train() using a custom function. I followed the documentation for how to build a custom routine for DirichletReg::DirichReg(), but I am a bit stumped on how to feed a multivariate response variable to caret::train(). The DirichletReg::DirichReg() function requires that response variable be formatted with the DirichletReg::DR_data() function first prior to modeling. However, when I feed a Dr_data object to caret::train(), I get this error:

Error: wrong model type for classification

I think this error is getting thrown because I have passed a multivariate response variable to the y= argument of caret::train() and the routine thinks that can only happen with classification algorithms. Does anyone have experience doing multivariate regression? Is there another way to change my custom routine so that caret::train() will accept multivariate response variables?

Below is my reproducible example:

##Loading Necessary Packages##
library(caret)
library(DirichletReg)

##Creating Fake Data##
set.seed(88)#For reproducibility

#Response variables#
PSME_BA<-rnorm(25,50, 15)
TSHE_BA<-rnorm(25,40,12)
ALRU2_BA<-rnorm(25,20,0.5)
Total_BA<-PSME_BA+TSHE_BA+ALRU2_BA

#Predictor variables#
B1<-runif(25, 0, 2000)
B2<-runif(25, 0, 1800)
B3<-runif(25, 0, 3000)

#Dataset for modeling#
df<-data.frame(PSME=PSME_BA/Total_BA, TSHE=TSHE_BA/Total_BA, ALRU2=ALRU2_BA/Total_BA,
               B1=B1, B2=B2, B3=B3)

##Creating a Dirichlet regression modeling routine to feed to caret::train()##
#Method list#
Dreg <- list(type='Regression', 
             library='DirichletReg',
             loop=NULL)

#Parameters element#
prm <- data.frame(parameter=c("model"),
                  class="character")
Dreg$parameters <- prm

#Grid element#
DregGrid  <- function(x, y, len=NULL, search="grid"){
  if(search == "grid"){
    out <- expand.grid(model=c("common", "alternative"), stringsAsFactors = F) # here force the strings as character,
    # othewise get error that the model arguments
    # were expecting 'chr' when fitting
  }
  out
}
Dreg$grid <- DregGrid

#Fit element#
DregFit <- function(x, y, param, ...){
  dat <- if(is.data.frame(x)) x else as.data.frame(x)
  dat$.outcome <- y
  theDots <- list(...)
  modelArgs <- c(list(formula = as.formula(".outcome ~ ."), data = dat, link=param$model, type=param$type), theDots)
  out <- do.call(DirichletReg::DirichReg, modelArgs)
  out$call <- NULL
  out
}
Dreg$fit <- DregFit

#Predict element#
DregPred <- function(modelFit, newdata, preProc=NULL, submodels=NULL){
  if(!is.data.frame(newdata)) newdata <- as.data.frame(newdata)
  DirichletReg::predict.DirichletRegModel(modelFit, newdata)
}
Dreg$predict <- DregPred

#prob element#
DregProb <- function(){
  return(NULL)
}
Dreg$prob <- DregProb

##Modeling the data using Dirichlet regression with repeated k-folds cross validation##
trCrtl<-trainControl(method="repeatedcv", number = 10, repeats = 5)
Y<-DR_data(df[,c(1:3)])#Converting the response variables to DR_data format
mod<-train(x=df[,-c(1:3)],Y, method=Dreg,trControl=trCrtl)#Throws error
0

There are 0 best solutions below