Bayesian network in Python: both construction and sampling

3.1k Views Asked by At

For a project, I need to create synthetic categorical data containing specific dependencies between the attributes. This can be done by sampling from a pre-defined Bayesian Network. After some exploration on the internet, I found that Pomegranate is a good package for Bayesian Networks, however - as far as I'm concerned - it seems unpossible to sample from such a pre-defined Bayesian Network. As an example, model.sample() raises a NotImplementedError (despite this solution says so).

Does anyone know if there exists a library which provides a good interface for the construction and sampling of/from a Bayesian network?

5

There are 5 best solutions below

0
On BEST ANSWER

I found out that PyAgrum (https://agrum.gitlab.io/pages/pyagrum.html) does the job. It can both be used to create a Bayesian Network via the BayesNet() class and to sample from such a network by using the .drawSamples() method from the a BNDatabaseGenerator() class.

0
On

Another option is pgmpy which is a Python library for learning (structure and parameter) and inference (statistical and causal) in Bayesian Networks.

You can generate forward and rejection samples as a Pandas dataframe or numpy recarray.

The following code generates 20 forward samples from the Bayesian network "diff -> grade <- intel" as recarray.

from pgmpy.models.BayesianModel import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.sampling import BayesianModelSampling

student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])

cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ['intel', 'diff'], [2, 2])

student.add_cpds(cpd_d, cpd_i, cpd_g)
inference = BayesianModelSampling(student)
df_samples = inference.forward_sample(size=20, return_type='recarray')

print(df_samples)
0
On

Another option is Bayespy (https://www.bayespy.org/index.html). You build the network using nodes. And on every node, you can call random() which essentially samples from its distribution: https://www.bayespy.org/dev_api/generated/generated/bayespy.inference.vmp.nodes.stochastic.Stochastic.random.html#bayespy.inference.vmp.nodes.stochastic.Stochastic.random

1
On

Using pyAgrum, you just have to :

#import pyAgrum
import pyAgrum as gum

# create a BN
bn=gum.fastBN("A->B[3]<-C{yes|No}->D")
# specify some CPTs (randomly filled by fastBN)
bn.cpt("A").fillWith([0.3,0.7])

# and then generate a database
gum.generateCSV(bn,"sample.csv",1000,with_labels=True,random_order=False) 
# which returns the LL(database)

the code in a notebook

See http://webia.lip6.fr/~phw/aGrUM/docs/last/notebooks/ for more notebooks using pyAgrum

Disclaimer: I am one of the authors of pyAgrum :-)

0
On

I was also searching for a library in python to work with bayesian networks learning, sampling, inference and I found bnlearn. I tried a couple of examples and it worked. It is possible to import several existing repositories or any .bif type. As per this library,

Sampling of data is based on forward sampling from joint distribution of the Bayesian network. In order to do that, it requires as input a DAG connected with CPDs. It is also possible to create a DAG manually (see create DAG section) or load an existing one