Information Gain= (Information before split)-(Information after split)
Information gain can be found by above equation. But what I don't understand is what is exactly the meaning of this information gain? Does it mean that how much more information is gained or reduced by splitting according to the given attribute or something like that???
Link to the answer: https://stackoverflow.com/a/1859910/740601
Information gain is the reduction in entropy achieved after splitting the data according to an attribute. IG = Entropy(before split) - Entropy(after split). See http://en.wikipedia.org/wiki/Information_gain_in_decision_trees
Entropy is a measure of the uncertainty present. By splitting the data, we are trying to reduce the entropy in it and gain information about it.
We want to maximize the information gain by choosing the attribute and split point which reduces the entropy the most.
If entropy = 0, then there is no further information which can be gained from it.