While understanding information gain calculation - The probability of cancer in a population is 1%. A test for cancer correctly identifies cancer patients with a probability of 50% and non cancer patients with a probability of 99.5%. Now I have to calculate information gain obtained using this cancer test? This is one of the exercise question I am trying to solve while learning entropy and information gain. edit - My attempt to calculate above is -
If we consider total population as 100 -
Cancer patient =1
Non-cancer patient = 99
Entropy H = -1/100 log(1/100)- 99/100 log(99/100)
Now the test on cancer patient gives me - 50% cancer patient and 50% non cancer patient. Hence entropy of classification as cancer patient -
H1 = -1/2(log1/2)-1/2log(1/2)
Non-cancer patients it gives 99.5% non-cancer patients and .5% cancer .Hence information gain should be. Entropy of classification to non-cancer patient is -
H2 = -(99.5*99/100)log(99.5*99/100) - (5/100)*99 log(5/100*99)
I want to know is it correct way to get entropy after test. If this is right the information gain can be calculated -
Information gain = H - (H1+H2)