I read the scikit-learn documentation about RANSACRegressor. It says
the min_samples parameter is highly dependent upon the model.
So, how one can calculate the min_samples parameter for non-linear estimator? For example, I want to use SVR with rbf kernel. What is the min_sample for this example?
You cannot generalize a rule to have an approximate min_samples value. However, you can use some domain knowledge to get to a starting point. For example, if the relationship between the features and the target variable is highly nonlinear, then we can assume there might be quite some noise and will want a higher value of
min_samples. Higher the value ofmin_sampleswe will need higher data points to be inliers before fitting the model. And the vice verse.On the other hand, you can let the machine estimate it for you. Do a grid search of different values of
min_samplesduring cross-validation and pick the one where the accuracy in both the training and validation set is highest.