Bayesian Networks: Structure Learning in Python is extremely slow compared to R

2.6k Views Asked by At

I'm currently working on a problem to do image classification on images using Bayesian Networks. I have tried using pomegranate, pgmpy and bnlearn. My dataset contains more than 200,000 images, on which I perform some feature extraction algorithm and get a feature vector of size 1026.

pgmpy

from pgmpy.models import BayesianModel
from pgmpy.estimators import HillClimbSearch, BicScore, K2Score
est = HillClimbSearch(feature_df, scoring_method=BicScore(feature_df[:20]))
best_model = est.estimate()
edges = best_model.edges()
model = BayesianModel(edges)

pomegranate

from pomegranate import *
model = BayesianNetwork.from_samples(feature_df[:20], algorithm='exact')

bnlearn

library(bnlearn)
df <- read.csv('conv_encoded_images.csv')
df$Age = as.numeric(df$Age)
res <- hc(df)
model <- bn.fit(res,data = df)

The program written in bnlearn in R completes running in couple of minutes, while the pgmpy runs for hours and pomegranate freezes my system after a few minutes. You can see from my code that I'm giving first 20 rows for training in both pgmpy and pomegranate programs, while bnlearn takes the whole dataframe. Since I am doing all my image preprocessing and feature extraction in python, it is difficult for me to switch between R and python for training.

My data contains continuous values ranging from 0 to 1. I've also tried discretizing the data to 0's and 1's, which didn't resolve the issue.

Is there any way I can speed up training in these python packages or am I doing anything wrong in my code?

Thanks for any help in advance.

Edit:

https://drive.google.com/file/d/1HbAqDQ6Uv1417zPFMgWBInC7-gz233j2/view?usp=sharing

This is dataset with 300 columns and ~40000 rows. In case you want to try reproducing the output.

0

There are 0 best solutions below