What is the actual meaning implied by information gain in data mining?

599 Views Asked by At
Information Gain= (Information before split)-(Information after split)

Information gain can be found by above equation. But what I don't understand is what is exactly the meaning of this information gain? Does it mean that how much more information is gained or reduced by splitting according to the given attribute or something like that???

Link to the answer: https://stackoverflow.com/a/1859910/740601

2

There are 2 best solutions below

0
On

Information gain is the reduction in entropy achieved after splitting the data according to an attribute. IG = Entropy(before split) - Entropy(after split). See http://en.wikipedia.org/wiki/Information_gain_in_decision_trees

Entropy is a measure of the uncertainty present. By splitting the data, we are trying to reduce the entropy in it and gain information about it.

We want to maximize the information gain by choosing the attribute and split point which reduces the entropy the most.

If entropy = 0, then there is no further information which can be gained from it.

0
On

Correctly written it is

Information-gain = entropy-before-split - average entropy-after-split

the difference of entropy vs. information is the sign. Entropy is high, if you do not have much information of the data.

The intuition is that of statistical information theory. The rough idea is: how many bits per record do you need to encode the class label assignment? If you have only one class left, you need 0 bits per record. If you have a chaotic data set, you will need 1 bit for every record. And if the class is unbalanced, you could get away with less than that, using a (theoretical!) optimal compression scheme; e.g. by encoding the exceptions only. To match this intuition, you should be using the base 2 logarithm, of course.

A split is considered good, if the branches have lower entropy on average afterwards. Then you have gained information on the class label by splitting the data set. The IG value is the average number of bits of information you gained for predicting the class label.