How to get log-likelihood for each iteration in sklearn GMM?

1.3k Views Asked by At

I am trying to fit a GMM in sklearn and i see that the model converges at around epoch 3 but i cannot seems to access the log-likelihood score computed at each epoch.

from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=4, tol=1e-8).fit(data)

Is there a way to do access the log-likelihood scores somehow for each epoch?

1

There are 1 best solutions below

0
On BEST ANSWER

If you just want to look at the loglik scores, you can set verbose=2 to print the change in loglik and verbose_interval=1 to capture it at every step:

from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3, tol=1e-8,verbose=2,verbose_interval=1)
gmm.fit(data)

Initialization 0
  Iteration 1    time lapse 0.00560s     ll change inf
  Iteration 2    time lapse 0.00134s     ll change 0.03655
  Iteration 3    time lapse 0.00119s     ll change 0.00867
  Iteration 4    time lapse 0.00118s     ll change 0.00619
  Iteration 5    time lapse 0.00116s     ll change 0.00612
  Iteration 6    time lapse 0.00125s     ll change 0.00647
  Iteration 7    time lapse 0.00128s     ll change 0.00700
  Iteration 8    time lapse 0.00127s     ll change 0.00727
  Iteration 9    time lapse 0.00126s     ll change 0.00673
  Iteration 10   time lapse 0.00117s     ll change 0.00604
  Iteration 11   time lapse 0.00109s     ll change 0.00530
  Iteration 12   time lapse 0.00125s     ll change 0.00431
  Iteration 13   time lapse 0.00121s     ll change 0.00366
  Iteration 14   time lapse 0.00123s     ll change 0.00404
  Iteration 15   time lapse 0.00130s     ll change 0.00361
  Iteration 16   time lapse 0.00118s     ll change 0.00157
  Iteration 17   time lapse 0.00124s     ll change 0.00048
  Iteration 18   time lapse 0.00126s     ll change 0.00015
  Iteration 19   time lapse 0.00115s     ll change 0.00005
  Iteration 20   time lapse 0.00116s     ll change 0.00001
  Iteration 21   time lapse 0.00124s     ll change 0.00000
  Iteration 22   time lapse 0.00122s     ll change 0.00000
  Iteration 23   time lapse 0.00142s     ll change 0.00000
  Iteration 24   time lapse 0.00126s     ll change 0.00000
  Iteration 25   time lapse 0.00124s     ll change 0.00000
  Iteration 26   time lapse 0.00122s     ll change 0.00000
  Iteration 27   time lapse 0.00120s     ll change 0.00000
Initialization converged: True   time lapse 0.03765s     ll -1.20124

To actually capture this value, depending on what you are using, you either write it to a log using logging , or for example below, in a jupyter notebook, this might work:

%%capture cap --no-stderr
gmm.fit(data)

Then we pass it into a dataframe and try to back calculate the likelihood:

res = pd.DataFrame([i.split() for i in cap.stdout.split("\n")]).iloc[:,[1,7]]
res.columns = ['iteration','change']
res.change = res.change.astype('float64')
res = res[np.isfinite(res.change)]
res['logLik'] = res['change'].values[-1]
res.loc[:len(res),['logLik']] = -res.change[:-1][::-1].cumsum()[::-1] + res.change.values[-1]
res


    iteration   change  logLik
2   2   0.03655 -1.31546
3   3   0.00867 -1.27891
4   4   0.00619 -1.27024
5   5   0.00612 -1.26405
6   6   0.00647 -1.25793
7   7   0.00700 -1.25146
8   8   0.00727 -1.24446
9   9   0.00673 -1.23719
10  10  0.00604 -1.23046
11  11  0.00530 -1.22442
12  12  0.00431 -1.21912
13  13  0.00366 -1.21481
14  14  0.00404 -1.21115
15  15  0.00361 -1.20711
16  16  0.00157 -1.20350
17  17  0.00048 -1.20193
18  18  0.00015 -1.20145
19  19  0.00005 -1.20130
20  20  0.00001 -1.20125
21  21  0.00000 -1.20124
22  22  0.00000 -1.20124
23  23  0.00000 -1.20124
24  24  0.00000 -1.20124
25  25  0.00000 -1.20124
26  26  0.00000 -1.20124
27  27  0.00000 -1.20124
28  converged:  -1.20124    -1.20124