I am trying to model the following problem with PYMC.
Description of the model
I have N devices that can be classified either as broken or working. The fraction of working devices is the unknown f_working.
Each working device has an efficiency eps (different for each device), between 0 and 1, distributed as a TruncatedNormal('eps', mu=mu, sigma=sigma, lower=0, upper=1) of unknown mu and sigma.
Each device can be tested individually. The result of the test provides
- whether the device is broken or working
- in case it is working, it's efficiency.
The test is not perfect, though. It has difficulty distinguishing low efficiency devices from broken ones. So, if a device has efficiency eps, the test has a probability P=f(eps) of being misclassified as broken. For simplicity, let's take f(eps)=1-eps**0.5.
After having tested all the N=300 devices, my goal is to infer the probability distribution of the parameters f_working, mu and sigma.
The issue
I don't understand how to model the observed distribution of eps with PYMC. It is the product of the intrinsic distribution (a TruncatedNormal) and the probability of the test being right (eps**0.5). While PYMC provides a way to sum distributions, using Mixture, I couldn't find a simple way to express the product of distributions.
In the end, I resolved to define a new CustomDist, which seems like an overkill to me, but still couldn't get it to work. Could you help me understand what is going on, and possibly suggest a better way to model this problem?
My attempt
The following code generates the observed sample
import numpy as np
import pymc
import pymc.math as pt
rng = np.random.default_rng(1234)
true_params={
"f_working" : 0.8,
"mu" : 0.3,
"sigma" : 0.5
}
N = 300
# Generate the intrinsic distributions:
# Get the number of working devices
N_working = rng.binomial(n=N, p=true_params["f_working"])
# Sample the intrinsic distribution of efficiencies
eps_intrinsic = rng.normal(
loc=true_params["mu"],
scale=true_params["sigma"],
size=N*100) # This should be enough
# Reject values outside of 0-1
eps_intrinsic = eps_intrinsic[(eps_intrinsic>=0) & (eps_intrinsic<=1)]
eps_intrinsic = eps_intrinsic[:N_working]
# Generate the observed distributions:
# Determine which of the working devices are correctly classified as working
working_observed = rng.random(N_working) < eps_intrinsic**0.5
# Generate the set of observations
measurements = {
"N_working_observed": np.sum(working_observed)*np.ones(working_observed.shape),
"eps_observed": eps_intrinsic[working_observed]
}
The following is how I am modelling it with PYMC
with pymc.Model() as model:
# Define the priors
prior = {
"f_working": pymc.Uniform("f_working", lower=0, upper=1),
"mu": pymc.Normal("mu", mu=0.5, sigma=3),
"sigma": pymc.Exponential("sigma", lam=2),
}
# The number of working devices
N_working = pymc.Binomial("N_working", p=prior["f_working"], n=N)
# The number of devices that the test declares to be working
N_working_observed = pymc.Binomial("N_working_observed",
p=eps_intrinsic**0.5,
n=N_working,
observed=measurements["N_working_observed"])
# Now constructing the observed distribution of eps
def logp(eps, mu, sigma):
# The intrinsic distribution is a truncated normal
eps_intrinsic = pymc.TruncatedNormal("eps_intrinsic",
mu=mu,
sigma=sigma,
lower=0,
upper=1)
# The probability that the test is correct depends on eps, this introduces a bias
bias = 0.5*pt.log(eps)
return pymc.logp(eps_intrinsic, eps).eval() + bias
eps_observed = pymc.CustomDist("eps_observed",
prior["mu"],
prior["sigma"],
logp=logp,
observed=measurements["eps_observed"])
idata = pymc.sample(3000)
The code doesn't work, and gives the error
MissingInputError: Input 0 (sigma_log__) of the graph (indices start from 0), used to compute Exp(sigma_log__), was not provided and not given a value. Use the PyTensor flag exception_verbosity='high', for more information on this error.