I'm using SVMs to train a model using 1000s of features (X) and labeled data (y), with the intention of using it to make predictions of y_new. Some of the features will not be very important to predict y, so I want to reduce the number of features keeping the most important ones. On the scikit-learn feature selection overview, it is written:
With SVMs and logistic-regression, the parameter C controls the sparsity: the smaller C the fewer features selected.
Does that refer to SVCs (with all kernels) as well as LinearSVCs? In other words, if I want to use, say, SVC with the 'rbf' kernel, is it necessary to select a subset of features as a preliminary step (with recursive feature elimination, or LASSO for example) to find the most important features that I will then use to train my SVC, or can I train the 1000s of features directly with the SVC, and simply make C "small".