Interpreting Variable Importance from Random Forest in GEE

667 Views Asked by At

This is more of a theoretical/function question. I'm doing a land cover classification in Google Earth Engine using random forest and need to report Variable Importance. Does anyone know how to interpret Variable Importance from random forest algorithm in GEE?

In terms of code, I got importance by doing:

var RFmodel = ee.Classifier.smileRandomForest(1000).train(trainingData, 'classID', predictionBands);

var RFexp = RFmodel.explain()

var VarImp = ee.Feature(null, ee.Dictionary(RFexp).get('importance'));
print('Variable Importance:', VarImp)

However, the resulting importance values range from 0 to 60. This doesn't look like the importance measure "Mean Decrease in Accuracy" or "Gini Index" from the randomForest package in R (which I'm more familiar with). So I guess I'm not really sure what these values mean in terms of variable "importance". Can anyone please help me to understand this?

Thanks in advance!

1

There are 1 best solutions below

0
On

In GEE, I believe this is the sum of decrease in Gini impurity index over all trees in the forest. In R I believe it is a weighted mean, so the difference is mean vs sum. In your code, it looks like you are using default 100 trees in GEE? Also see GEE code notes for further info, e.g. line 126. RandomForest.java#L120

Good luck, and I'm also interested in further explanation if someone can elaborate.