This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to
I don't understand how the entropy for each individual attribute (sunny, windy, rainy) is calculated--specifically, how p-sub-i is calculated. It seems different than the way it is calculated for Entropy(S). Can anyone explain the process behind this calculation?
To split a node into two different child nodes, one method consists splitting the node according to the variable that can maximise your information gain. When you reach a pure leaf node, the information gain equals 0 (because you can't gain any information by splitting a node containing only one variable -
logic
).In your example
Entropy(S) = 1.571
is your current entropy - the one you have before splitting. Let's call itHBase
. Then you compute the entropy depending on several splittable parameters. To get your Information Gain, you substract the entropy of your child nodes toHBase
->gain = Hbase - child1NumRows/numOfRows*entropyChild1 - child2NumRows/numOfRows*entropyChild2
The objective is to get the best of all Information Gains!
Hope this helps! :)