Choosing a sample rate for GBM models

230 Views Asked by At

I've created several GBM models to tune the parameters (trees, shrinkage and depth) to my data and the model performs well on the out-of-time sample. The data is credit card transactions (running into 100s of millions) so I sampled 1% of the good (non-event) and 100% of the bad.

However, when I increased the sample size to 3% of the good, there was a noticeable improvement in performance. My question is - how do I decide the optimal sampling rate, without running several iterations and deciding which one fits best? Is there a theory around this?

I have about 3 million total transactions (for the 1% sample), containing 380k bads and ~250 variables

0

There are 0 best solutions below