I have a model that I built using regression modeling strategies that Frank Harrell discusses in his Regression Modeling Strategies book (i.e. pre-specify, use working scientific knowledge, mask everything to the dependent variable, data reduction, etc). I can't really go into the details of the motivation of the models, but I can say that I used more degrees of freedom than I really have the data for. I calculated a shrinkage estimate to account for this using the bootstrap, but it suggested that my model doesn't require any shrinkage. When I use plot(calibrate(model))
(in the rms R package), the apparent curve is almost spot-on with the ideal curve, but the bias-corrected curve is quite a bit off. When I use validate(model)
(again, rms package), some of the optimism isn't really what I want to see either. So, my question: validate and calibrate seem to tell me that my model needs some improvement - particularly due to overfitting. How can I use these results to improve my actual model (i.e. get coefficients, covariance matrices, predictions, etc that are bias-corrected).
Here is a reproducible example of what I am seeing (though the calibration curve is much better).
set.seed(1)
require(rms)
data("iris")
iris$outcome <- rbinom(150, 1, .3)
model <- lrm(outcome ~ Sepal.Length + Sepal.Width + Petal.Length, data = iris, x = TRUE, y = TRUE)
plot(calibrate(model))
validate(model)