I would like to use function optim()
in R to minimise the target function. The two optimised parameters both have constrains.
I have created a test sampel data. Flow is a random series data separated by NAs. The function NAins()
can be seen at the end of this question.
flow = c(rep(NA,10),NAins(as.data.frame(runif(5000)), .1)$runif)
rain = runif (length(flow))
event = with(rle(!is.na(flow )),cbind(length=lengths[values],position=cumsum(c(1,lengths))[values]));
This function is to calculate r2.
test_function = function(ndays, event, flow, rain,upboundary){
flowvolume = rainvolume = raininweek = raininmonth =NULL;
for (i in 1:(length(event)/2)){
if (upboundary < event[,'position'][i]){
flowvolume[i] = sum(flow[(event[,'position'][i]):(event[,'position'][i]+event[,'length'][i]-1)], na.rm = TRUE) # total flow during the non NA period
rainvolume[i] = sum(rain[(event[,'position'][i]):(event[,'position'][i]+event[,'length'][i]-1)], na.rm = TRUE) # total rain during the non NA period
raininweek[i] = sum(rain[(event[,'position'][i]-ndays[1]):(event[,'position'][i]-1)], na.rm = TRUE) #total rain imediate before NA with a constrained period of nday[1]
raininmonth[i] = sum(rain[(event[,'position'][i]-ndays[2]-ndays[1]):(event[,'position'][i]-ndays[1]-1)], na.rm = TRUE) #total rain iprior to nday[1]
} else {next}
}
-summary(lm(flowvolume ~ rainvolume + raininweek + raininmonth))$r.squared # to minimise R2
}
This is the optimisation with constrains.
results= optim(par=c(2,20), lower=c(1,10), upper=c(10,30),method="L-BFGS-B",test_function, event=event, rain=rain, flow=flow,upboundary=30)
In this simulation, Results always converge to the staring position. If optim()
is not a good choose in this question, could you recommend some other packages or function to use?
Here is the function used to create sample flow data with random NA in between.
################################################################
# RANDOMLY INSERT A CERTAIN PROPORTION OF NAs INTO A DATAFRAME #
################################################################
NAins <- NAinsert <- function(df, prop){
n <- nrow(df)
m <- ncol(df)
num.to.na <- ceiling(prop*n*m)
id <- sample(0:(m*n-1), num.to.na, replace = FALSE)
rows <- id %/% m + 1
cols <- id %% m + 1
sapply(seq(num.to.na), function(x){
df[rows[x], cols[x]] <<- NA
}
)
return(df)
}
It seems the optimization is never moving far enough away from the starting point because these parameters are implicitly integer. However
optim
does not know this. It just sees a flat gradient.If your parameter space for ndays is small, as you indicate in your question, try enumerating over all these combinations instead. Here is a handy function. Hat tip to How to optimize for integer parameters (and other discontinuous parameter space) in R?.
Notice I had to cut off part of the search space for ndays[2], because it resulted in a negative subscript error. You will need to add some checks in your function to test for negative subscripts.