I wonder if there is an option to run two different models - a regressor and classifier (with PMMLPipeline or not), which eventually will be saved in one PMML file? The models have different X matrices and different outputs, and they are actually not related to each other, but I want to have the lowest PMML loading time, and therefore checking if a combination like this is possible.
I saw that there is an option to run different models when all of them should be regressors or classifiers, with the same X and y, and then to manipulate the output if needed (Average and so). Thats not what I am looking for.
Are you looking to assemble a multi-output estimator? Scikit-Learn provides
MultiOutputClassifier
andMultiOutputRegressor
estimator types, but they fall short here, because they assume a unified mining function type (ie. all members are all classifiers or all regressors, but not a mix of them).You may take a look at the
sklearn2pmml.ensemble.EstimatorChain
meta-estimator in itsmultioutput = True
configuration:For starters, you should assemble an
EstimatorChain
object from pre-fitted classifiers and regressors. When you say that they use disjoint feature sets, then these pre-fitted estimators should be packaged as full-blown Pipeline objects with all the pre-processing steps included.TLDR: What you're asking is easy from PMML perspective (combining two independent models into a single PMML document) but rather difficult from Scikit-Learn/Python perspective. Should be doable using
EstimatorChain
, but perhaps it would be justified to develop some special-purpose code for such merge activities. If you need technical advice regarding the SkLearn2PMML package, you might have better luck if you ask it in its issue tracker.