How does pessimistic error pruning in C4.5 algorithm working?

213 Views Asked by At

I'm studing C4.5 algorithm and trying to make it in java by myself, but in the part of pruning, I don't understand what it compute in book of C4.5.

In this book,it said:"When N training cases are covered by a leaf, E of them incorrectly....For a given confidence level CF, ..., this upper limit is here written UCF(E,N)(default confidence level is 25%)....a leaf covering N training cases with a predicted error rate of UCF(E,N) would give rise to predicted N*UCF(E,N) errors."

author show a example, in this subtree:

  • education cost == n: democrat(6)
  • education cost == y: democrat(9)
  • education cost == u: republican(1)

For the first leaf, N=6, E=0, U25%(0,6)=0.206, for the remaining leaves, U25%(0,9)=0.143 and U25%(0,1)=0.75, so the number of predicted errors for this subtree is 60.206+90.143+1*0.75=3.273.

If the subtree were replace by a democrat, the predicted errors is 16U25%(1,16)=160.157=2.512, it is less than 3.273, so it should pruned to a leaf.

But I don't know how it computed U25%(0,6)=0.206, U25%(0,9)=0.143 and so on, and inceased the number of errors observed at each leaf by 0.5, where were put 0.5 into? I have search on internet, there are many different formula so I don't know which one is correct.

0

There are 0 best solutions below