I am interested in computing confidence intervals for my feature weights using a bootstrap approach. Is scipy.stats.bootstrap able to do this? Consider this classification task as an example (but same idea for regression tasks). We can get coefficients which will return a vector of feature weights.
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = LinearDiscriminantAnalysis()
clf.fit(X, y)
coefficients = clf.coef_
The idea would be to draw samples (with replacement) n-times from X and y, fit the classifier on these batches, get the coefficients and finally compute confidence intervals using the coefficients from all resampling trials.
Yes, in the sense that
bootstrapsupports vector-valuedstatistics. For instance, this is valid code:LinearDiscriminantAnalysisseems to have trouble with some of the resamples, but you can see that the code is valid by replacing theclflines with something likereturn X[:, 0].mean(), X[:, 1].var(); i.e. bootstrap confidence intervals of the mean of the first feature and variance of the second feature at the same time. Importantly, becausepaired=True, different features of the same observations stay paired, and of course the statistic can depend on both the first and second feature at the same time, as in your example.