Define Bayesian Network and do Parameter Training with pgmpy

223 Views Asked by At

I want to create a BayesianNetwork with pgmpy in python. I know the names of my nodes, and the edges, essentially the structure of the graph of my Bayesian network. I want to train the Bayesian network with 'labeled' data; this means I do not have CPDs or TabularCPDs defined beforehand. So I know the nodes, the edges, and the states (e.g. low/medium/high) that my node variables can take. How do I build a Bayesian network model/object using pgmpy? I saw multiple examples (linked below) but I do not understand the part on how I can define what states my observable and fault variables can take.

In this Parameter Learning example, data is available and the 'BayesianModel' is defined. I have data and I can put it in a similar pandas dataframe format. I am using 'BayesianNetwork' instead of 'BayesianModel'. I do not understand what is happening from section 10.1.1. State counts onward. For defining my BayesianNetwork, I have a dictionary of observed_variables and fault_variables, and another list of tuples defining the edges. How do I define it in the same way as in the example?

In another example here, Parameter Learning in Discrete Bayesian Networks, the model used is from an example so the BayesianNetwork object is already defined with all the correct attributes, but I want to build my own, so it is unclear how to do this.

1

There are 1 best solutions below

2
On

In the parameter learning example, the state counts is just an analysis step to understand the data. I will show a straightforward example here.

Simulate some data that we will use to learn the network parameters/CPDs.

from pgmpy.utils import get_example_model
model = get_example_model('alarm')
df = model.simulate(int(1e4))
edges = list(model.edges())

Let's say for our model we want to use the edge list: edges and the dataset: df.

# Create a model with edges
from pgmpy.models import BayesianNetwork
model = BayesianNetwork(edges)

# Learn the model CPDs using the given dataset
model.fit(df)

# Print one of the learned CPD
print(model.cpds[0])

The fit method by default does a Maximum Likelihood estimate of the CPDs. You can also pass the estimator argument to use BayesianEstimator.