Import sklearn2pmml generated .pmml back into ScikitLearn or Python

6.9k Views Asked by At

Apologies if this may have been answered somewhere but I've been looking for about an hour and can't find a good answer.

I have a simple Logistic Regression model trained in Scikit-Learn that I'm exporting to a .pmml file.

  from sklearn2pmml import PMMLPipeline, sklearn2pmml
  my_pipeline = PMMLPipeline(
  ( classifier", LogisticRegression() )
      )
  my_pipeline.fit(blah blah)
  sklearn2pmml(my_pipeline, "filename.pmml")

etc....

So what I'm wondering is if/how I can import this file back into Python (2.7 preferably) or Scikit-Learn to use as I would in Java/Scala. Something along the lines of

"import (filename.pmml) as pm pm.predict(data)

Thanks for any help!

4

There are 4 best solutions below

2
On BEST ANSWER

Scikit-learn does not offer support for importing PMML files, so what you're trying to achieve cannot be done I'm afraid.

The concept of using libraries such as sklearn2pmml is really to extend the functionality that sklearn does not have when it comes to supporting the model export to a PMML format.

Typically, those who use sklearn2pmml are really looking to re-use the PMML models in other platforms (e.g. IBM's SPSS, Apache Spark ML, Weka or any other consumer as listed in the Data Mining Group's website).

If you're looking to save a model created with scikit-learn and re-use it afterwards with scikit-learn as well then you should explore its native persistence model mechanism named Pickle, which uses a binary data format.

You can read more about how to save/load models in Pickle format (together with its known issues) here.

0
On

I believe you can Import/Export a pmml file with python. After you load back your model you can predict again with out any problem. However output file formats can differ, like 1d array, or nxn panda tables etc.

from sklearn2pmml import make_pmml_pipeline, sklearn2pmml
from pypmml import Model

#Extract as pmml
yourModelPipeline = make_pmml_pipeline(yourModelObjectGoesHere)
sklearn2pmml(yourModelPipeline, "yourModel.pmml")

#Load from pmml
yourModelLoaded = Model.fromFile('yourModel.pmml')
prediction = yourModelLoaded.predict(yourPredictionDataSet)

Lastly reproducing result make take long time, don't let it discourage you :). I would like to share developers comment about the issue: https://github.com/autodeployai/pypmml/issues/53

0
On

You could use PyPMML to make predictions on a new dataset using PMML in Python, for example:

from pypmml import Model

model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)

The data could be dict, json, Series or DataFrame of Pandas.

0
On

I created a simple solution to generate sklearn kmeans models from pmml files which i exported from knime analytics platform. You can check it out pmml2sklearn