LightGBM model dumped in PMML format gives different predictions from the original model

114 Views Asked by karan At 05 June 2025 at 03:34

I want to use a sklearn model in pyspark.

I have trained a lightGBM model (in sklearn) which gives out probabilities for a propensity problem. Then converted this model to PMML format like this:

from sklearn2pmml import sklearn2pmml
sklearn2pmml(trained_model, 'prod_trained_model.pmml')

And then in pyspark, I read the PMML model like this:

from pypmml_spark import ScoreModel
model_pipeline = ScoreModel.fromFile('prod_trained_model.pmml')

Then I make predictions like this ('features_df' is a pyspark dataframe):

predictions_df = model_pipeline.transform(features_df)

Now the problem is that the model predictions do not match with those of original model. There is a shift of 5% to 10% in the predicted probabilities.

Also for around 5% of the rows in input dataframe, the output probabilities by PMML model are NaN. Whereas for the exact same rows, the original model is predicting fine.

Original Q&A

There are 1 best solutions below

Niqua On 04 January 2024 at 10:38

This is a common problem with missing / null values.

PMML models do not process nulls the same way as your model.
This is the reason why you should be using .fillna() to impute missing values.

LightGBM model dumped in PMML format gives different predictions from the original model

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in PYSPARK

Related Questions in LIGHTGBM

Related Questions in PMML

Trending Questions

Popular # Hahtags

Popular Questions