What is the order in catboost's select_features mean?

158 Views Asked by At

I came across Catboost's select_features function that uses RFE. Is the order of eliminated features represent the order in which features were removed? Or, is it random?

https://catboost.ai/en/docs/concepts/python-reference_catboost_select_features https://catboost.ai/en/docs/concepts/output-data_features-selection

I am assuming the order is not random, but represents the order in which features were removed in a given iteration.

1

There are 1 best solutions below

4
Jesse Sealand On

The ordering of features depends on the algorithm parameter that defines how those features are identified. model.select_features(algorithm="") has three possible values as described below.

        algorithm : EFeaturesSelectionAlgorithm or string, optional (default=RecursiveByShapValues)
            Which algorithm to use for features selection.
            Possible values:
                - RecursiveByPredictionValuesChange
                    Use prediction values change as feature strength, eliminate batch of features at once.
                - RecursiveByLossFunctionChange
                    Use loss function change as feature strength, eliminate batch of features at each step.
                - RecursiveByShapValues
                    Use shap values to estimate loss function change, eliminate features one by one.

The only setting that is listed in order of removal is RecursiveByShapValues, which is the default setting.

edit

Knowing the default setting we can dig into the feature_selection algorithm in the catboost source code, which is written in c, not python, in the following folder catboost/libs/features_selection. Here is the direct link to the function definition.

Based on my reading of the function it operates by recursively eliminating the worst scoring feature at each round. This indicates to me that the ordering of the 'eliminated_features' list in the python source code is based on order as they were eliminated.