PYMC: how to define the product of two probability distributions?

59 Views Asked by At

I am trying to model the following problem with PYMC.

Description of the model

I have N devices that can be classified either as broken or working. The fraction of working devices is the unknown f_working. Each working device has an efficiency eps (different for each device), between 0 and 1, distributed as a TruncatedNormal('eps', mu=mu, sigma=sigma, lower=0, upper=1) of unknown mu and sigma.

Each device can be tested individually. The result of the test provides

  • whether the device is broken or working
  • in case it is working, it's efficiency.

The test is not perfect, though. It has difficulty distinguishing low efficiency devices from broken ones. So, if a device has efficiency eps, the test has a probability P=f(eps) of being misclassified as broken. For simplicity, let's take f(eps)=1-eps**0.5.

After having tested all the N=300 devices, my goal is to infer the probability distribution of the parameters f_working, mu and sigma.

The issue

I don't understand how to model the observed distribution of eps with PYMC. It is the product of the intrinsic distribution (a TruncatedNormal) and the probability of the test being right (eps**0.5). While PYMC provides a way to sum distributions, using Mixture, I couldn't find a simple way to express the product of distributions.

In the end, I resolved to define a new CustomDist, which seems like an overkill to me, but still couldn't get it to work. Could you help me understand what is going on, and possibly suggest a better way to model this problem?

My attempt

The following code generates the observed sample

import numpy as np
import pymc
import pymc.math as pt

rng = np.random.default_rng(1234)

true_params={
    "f_working" : 0.8,
    "mu" : 0.3,
    "sigma" : 0.5
}
N = 300

# Generate the intrinsic distributions:
# Get the number of working devices
N_working = rng.binomial(n=N, p=true_params["f_working"])

# Sample the intrinsic distribution of efficiencies
eps_intrinsic = rng.normal(
        loc=true_params["mu"],
        scale=true_params["sigma"],
        size=N*100) # This should be enough
# Reject values outside of 0-1
eps_intrinsic = eps_intrinsic[(eps_intrinsic>=0) & (eps_intrinsic<=1)]
eps_intrinsic = eps_intrinsic[:N_working]

# Generate the observed distributions:
# Determine which of the working devices are correctly classified as working
working_observed = rng.random(N_working) < eps_intrinsic**0.5

# Generate the set of observations
measurements = {
    "N_working_observed": np.sum(working_observed)*np.ones(working_observed.shape),
    "eps_observed": eps_intrinsic[working_observed]
}

The following is how I am modelling it with PYMC

with pymc.Model() as model:
    # Define the priors    
    prior = {
        "f_working": pymc.Uniform("f_working", lower=0, upper=1),
        "mu": pymc.Normal("mu", mu=0.5, sigma=3),
        "sigma": pymc.Exponential("sigma", lam=2),
    }

    # The number of working devices
    N_working = pymc.Binomial("N_working", p=prior["f_working"], n=N)
    # The number of devices that the test declares to be working
    N_working_observed = pymc.Binomial("N_working_observed",
                            p=eps_intrinsic**0.5,
                            n=N_working,
                            observed=measurements["N_working_observed"])

    # Now constructing the observed distribution of eps 
    def logp(eps, mu, sigma):
        # The intrinsic distribution is a truncated normal
        eps_intrinsic = pymc.TruncatedNormal("eps_intrinsic",
                            mu=mu,
                            sigma=sigma,
                            lower=0,
                            upper=1)
        # The probability that the test is correct depends on eps, this introduces a bias
        bias = 0.5*pt.log(eps)
        return pymc.logp(eps_intrinsic, eps).eval() + bias

    eps_observed = pymc.CustomDist("eps_observed",
                                   prior["mu"],
                                   prior["sigma"],
                                   logp=logp,
                                   observed=measurements["eps_observed"])
    
    idata = pymc.sample(3000)

The code doesn't work, and gives the error

MissingInputError: Input 0 (sigma_log__) of the graph (indices start from 0), used to compute Exp(sigma_log__), was not provided and not given a value. Use the PyTensor flag exception_verbosity='high', for more information on this error.
0

There are 0 best solutions below