Use of formula in information.gain in R

3.7k Views Asked by At

In the function definition for the FSelector information.gain function,

information.gain(formula, data)

what exactly is the purpose of the formula? I'm trying to use the function to do feature selection for a classification task. In the few examples that I've seen online, it seems like the formula defines some kind of relationship between the class label and the features in the dataset. However, if this is the case, I don't know the exact linear relationship between the features and the labels since I'm performing a classification task, so what would the formula be?

1

There are 1 best solutions below

0
On BEST ANSWER

You can use . to tell R that you want to analyse the dependency between a class variable and all other variables in the data frame. For example for the iris dataset:

> library(FSelector)
> information.gain(Species~., iris)
                attr_importance
Sepal.Length       0.4521286
Sepal.Width        0.2672750
Petal.Length       0.9402853
Petal.Width        0.9554360

If you want to analyse the interaction with respect to only a subset of the variables, you can use explicit names:

> information.gain(Species~Sepal.Length+Sepal.Width, iris)
                attr_importance
Sepal.Length       0.4521286
Sepal.Width        0.2672750