PYMC: how to define the product of two probability distributions?

59 Views Asked by Prallax At 29 December 2023 at 21:10

I am trying to model the following problem with PYMC.

Description of the model

I have N devices that can be classified either as broken or working. The fraction of working devices is the unknown f_working. Each working device has an efficiency eps (different for each device), between 0 and 1, distributed as a TruncatedNormal('eps', mu=mu, sigma=sigma, lower=0, upper=1) of unknown mu and sigma.

Each device can be tested individually. The result of the test provides

whether the device is broken or working
in case it is working, it's efficiency.

The test is not perfect, though. It has difficulty distinguishing low efficiency devices from broken ones. So, if a device has efficiency eps, the test has a probability P=f(eps) of being misclassified as broken. For simplicity, let's take f(eps)=1-eps**0.5.

After having tested all the N=300 devices, my goal is to infer the probability distribution of the parameters f_working, mu and sigma.

The issue

I don't understand how to model the observed distribution of eps with PYMC. It is the product of the intrinsic distribution (a TruncatedNormal) and the probability of the test being right (eps**0.5). While PYMC provides a way to sum distributions, using Mixture, I couldn't find a simple way to express the product of distributions.

In the end, I resolved to define a new CustomDist, which seems like an overkill to me, but still couldn't get it to work. Could you help me understand what is going on, and possibly suggest a better way to model this problem?

My attempt

The following code generates the observed sample

import numpy as np
import pymc
import pymc.math as pt

rng = np.random.default_rng(1234)

true_params={
    "f_working" : 0.8,
    "mu" : 0.3,
    "sigma" : 0.5
}
N = 300

# Generate the intrinsic distributions:
# Get the number of working devices
N_working = rng.binomial(n=N, p=true_params["f_working"])

# Sample the intrinsic distribution of efficiencies
eps_intrinsic = rng.normal(
        loc=true_params["mu"],
        scale=true_params["sigma"],
        size=N*100) # This should be enough
# Reject values outside of 0-1
eps_intrinsic = eps_intrinsic[(eps_intrinsic>=0) & (eps_intrinsic<=1)]
eps_intrinsic = eps_intrinsic[:N_working]

# Generate the observed distributions:
# Determine which of the working devices are correctly classified as working
working_observed = rng.random(N_working) < eps_intrinsic**0.5

# Generate the set of observations
measurements = {
    "N_working_observed": np.sum(working_observed)*np.ones(working_observed.shape),
    "eps_observed": eps_intrinsic[working_observed]
}

The following is how I am modelling it with PYMC

with pymc.Model() as model:
    # Define the priors    
    prior = {
        "f_working": pymc.Uniform("f_working", lower=0, upper=1),
        "mu": pymc.Normal("mu", mu=0.5, sigma=3),
        "sigma": pymc.Exponential("sigma", lam=2),
    }

    # The number of working devices
    N_working = pymc.Binomial("N_working", p=prior["f_working"], n=N)
    # The number of devices that the test declares to be working
    N_working_observed = pymc.Binomial("N_working_observed",
                            p=eps_intrinsic**0.5,
                            n=N_working,
                            observed=measurements["N_working_observed"])

    # Now constructing the observed distribution of eps 
    def logp(eps, mu, sigma):
        # The intrinsic distribution is a truncated normal
        eps_intrinsic = pymc.TruncatedNormal("eps_intrinsic",
                            mu=mu,
                            sigma=sigma,
                            lower=0,
                            upper=1)
        # The probability that the test is correct depends on eps, this introduces a bias
        bias = 0.5*pt.log(eps)
        return pymc.logp(eps_intrinsic, eps).eval() + bias

    eps_observed = pymc.CustomDist("eps_observed",
                                   prior["mu"],
                                   prior["sigma"],
                                   logp=logp,
                                   observed=measurements["eps_observed"])
    
    idata = pymc.sample(3000)

The code doesn't work, and gives the error

MissingInputError: Input 0 (sigma_log__) of the graph (indices start from 0), used to compute Exp(sigma_log__), was not provided and not given a value. Use the PyTensor flag exception_verbosity='high', for more information on this error.

Original Q&A

PYMC: how to define the product of two probability distributions?

Description of the model

The issue

My attempt

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in BAYESIAN

Related Questions in PYMC

Related Questions in MCMC

Trending Questions

Popular # Hahtags

Popular Questions