PyMC Bernoulli model checking

568 Views Asked by At

I am currently trying to do model checking with PyMC where my model is a Bernoulli model and I have a Beta prior. I want to do both a (i) gof plot as well as (ii) calculate the posterior predictive p-value.

I have got my code running with a Binomial model, but I am quite struggling to find the right way of making a Bernoulli model working. Unfortunately, there is no example anywhere that I can work with. My code looks like the following:

import pymc as mc
import numpy as np
alpha = 2
beta = 2
n = 13
yes = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,0,0])

p = mc.Beta('p',alpha,beta)
surv = mc.Bernoulli('surv',p=p,observed=True,value=yes)
surv_sim = mc.Bernoulli('surv_sim',p=p)

mc_est = mc.MCMC({'p':p,'surv':surv,'surv_sim':surv_sim})
mc_est.sample(10000,5000,2)

import matplotlib.pylab as plt
plt.hist(mc_est.surv_sim.trace(),bins=range(0,3),normed=True)
plt.figure()
plt.hist(mc_est.p.trace(),bins=100,normed=True)

mc.Matplot.gof_plot(mc_est.surv_sim.trace(), 10/13., name='surv')

#here I have issues
D = mc.discrepancy(yes, surv_sim, p.trace())
mc.Matplot.discrepancy_plot(D)

The main problem I am having is in determining the expected values for the discrepancy function. Just using p.trace() does not work here, as these are the probabilities. Somehow, I need to incorporate the sample size here, but I am struggling to do that in a similar way as I would do it for a Binomial model. I am also not quite sure, if I am doing the gof_plot correctly.

Hope someone can help me out here! Thanks!

1

There are 1 best solutions below

3
On

Per the discrepancy function doc string, the parameters are:

observed : Iterable of observed values (size=(n,))
simulated : Iterable of simulated values (size=(r,n))
expected : Iterable of expected values (size=(r,) or (r,n))

So you need to correct two things:

1) modify your simulated results to have size n (e.g., 13 in your example):

surv_sim = mc.Bernoulli('surv_sim', p=p, size=n)

2) encapsulate your p.trace() with the bernoulli_expval method:

D = mc.discrepancy(yes, surv_sim.trace(), mc.bernoulli_expval(p.trace()))

(bernoulli_expval just spits back p.)

With those two changes, I get the following:

enter image description here