Is it possible to get negative information gain if Laplace smoothing is used too?
We know:
IG = H(Y) - H(Y|X)
Here, H is the entropy function and IG is the information gain.
Also:
H(Y) = -ΣyP(Y=y).log2(P(Y=y))
H(Y|X) = ΣxP(X=x).H(Y|X=x)
H(Y|X=x) = -ΣyP(Y=y|X=x).log2(P(Y=y|X=x))
For example, suppose P(Y=y|X=x) = ny|x/nx. But it is possible that nx = 0 and ny|x = 0. So, I do laplace smoothing and define P(Y=y|X=x) = (ny|x+1)/(nx+|X|). Here, |X| denote the number of possible values that X can take(number of splits possible if X is chosen as the attribute). Is it possible that due to laplace smoothing, I get negative information gain?