Running two different models in one PMML

38 Views Asked by At

I wonder if there is an option to run two different models - a regressor and classifier (with PMMLPipeline or not), which eventually will be saved in one PMML file? The models have different X matrices and different outputs, and they are actually not related to each other, but I want to have the lowest PMML loading time, and therefore checking if a combination like this is possible.

I saw that there is an option to run different models when all of them should be regressors or classifiers, with the same X and y, and then to manipulate the output if needed (Average and so). Thats not what I am looking for.

1

There are 1 best solutions below

0
On

I wonder if there is an option to run two different models - a regressor and classifier - in one PMML file?

Are you looking to assemble a multi-output estimator? Scikit-Learn provides MultiOutputClassifier and MultiOutputRegressor estimator types, but they fall short here, because they assume a unified mining function type (ie. all members are all classifiers or all regressors, but not a mix of them).

You may take a look at the sklearn2pmml.ensemble.EstimatorChain meta-estimator in its multioutput = True configuration:

from sklearn2pmml.ensemble import EstimatorChain

estimator = EstimatorChain([
  ("regressor", my_prefitted_regressor, str(True)),
  ("classifier", my_prefitted_classifier, str(True))
])

pmml_pipeline = make_pmml_pipeline(estimator, active_fields = ..., target_fields = ...)
sklearn2pmml(pmml_pipeline, "EstimatorChain.pmml.xml")

For starters, you should assemble an EstimatorChain object from pre-fitted classifiers and regressors. When you say that they use disjoint feature sets, then these pre-fitted estimators should be packaged as full-blown Pipeline objects with all the pre-processing steps included.

TLDR: What you're asking is easy from PMML perspective (combining two independent models into a single PMML document) but rather difficult from Scikit-Learn/Python perspective. Should be doable using EstimatorChain, but perhaps it would be justified to develop some special-purpose code for such merge activities. If you need technical advice regarding the SkLearn2PMML package, you might have better luck if you ask it in its issue tracker.