How can I select features for Symbolic Regression

305 Views Asked by At

How can I select features for Symbolic Regression ? I have 30 features, I want to use only the most sensitive features for Symbolic Regression.

As an example, this dataset can be used which is similar to my dataset. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html

2

There are 2 best solutions below

1
On

One possibility is to first use a random forest to fit the data, and then select the features that the random forest deems to be the most important.

0
On

30 features are not so many. Genetic Programming should be able to automatically select the most useful ones.

Of course, you should not use constants. Using constants might allow some not important features to be included in the final expression by multiplying them with a very small constant.

However, it is difficult to totally exclude constants. For instance, if you use the division operator, it will generate constants as a side effect: x/x = 1 and if you have the constant 1, then you can obtain 1+1, and then 1/(1+1) and so on ...

Anyway, do you have some data to test on? I maintain a free software implementing a GP variant (Multi Expression Programming). If you send me the data I can run them, or you try by yourself from my website: https://mepx.org

Update: I've run my program on the Boston house price dataset and I have obtained an error of about 4% from the expected output with 8 features only. Attached is a screenshot. However, I'm not very sure if the program does not try to discover some constants by itself, because the solution is too long (has 37 instructions).

enter image description here