My pipeline uses a custom conversion function, it cannot be successfully converted using sklearn2pmml.
Here is my custom function code
def calc_modify_days(X):
X['modify_date_new'] = X['modify_date'].apply(lambda x:x[:4]+'-'+x[4:6]+'-'+x[6:8] if x!='' and x<'20221230' else '2022-12-30' )
X['modify_days'] = (pd.to_datetime(X['day_id']) - pd.to_datetime(X['modify_date_new'])).dt.days
X['modify_days'] = X['modify_days'].apply(lambda x:-1 if x<0 else x)
return X['modify_days']
def transform_channel_ty_cd(X):
return X.apply(lambda x: all_cate_dict['channel_type_cd_3'].get(x) if x in all_cate_dict['channel_type_cd_3'] else 0)
Below is the pipeline code, which works properly for prediction
mapper_encode = [
(['day_id','modify_date'],FunctionTransformer(calc_modify_days),{'alias':'modify_days'}),
('channel_type_cd_3',FunctionTransformer(transform_channel_ty_cd))]
mapper = DataFrameMapper(mapper_encode, input_df=True, df_out=True)
pipeline_test = PMMLPipeline(
steps=[("mapper", mapper),
("classifier", clf_1)])
But when I try to convert the pipeline to a pmml file, I get an error
Standard output is empty
Standard error:
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 61 ms.
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Converting..
Oct 27, 2022 3:43:25 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Java class net.razorvine.pickle.objects.ClassDictConstructor)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:92)
at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:63)
at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:43)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
at sklearn.Initializer.encodeFeatures(Initializer.java:44)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn.Composite.encodeFeatures(Composite.java:129)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
at org.jpmml.sklearn.Main.run(Main.java:228)
at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to numpy.core.UFunc
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
... 12 more
Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Java class net.razorvine.pickle.objects.ClassDictConstructor)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:92)
at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:63)
at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:43)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
at sklearn.Initializer.encodeFeatures(Initializer.java:44)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn.Composite.encodeFeatures(Composite.java:129)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
at org.jpmml.sklearn.Main.run(Main.java:228)
at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to numpy.core.UFunc
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
... 12 more
I tried to look it up and the FunctionTransformer
and lambda
functions seem to be the problem。
How should I solve it?
I tried to convert pipeline to pkl.z file first and then to pmml file, but similar error occurred.
In addition, I tried to remove the lambda function, but it still doesn't work, not as long as it's a custom feature handler.
This question has been answered in jpmml/sklearn2pmml#354
In short, the inability to pickle
FunctionTransformer
instances that contain lambda functions (or reference local functions) is a Python limitation. The SkLearn2PMML package is simply complaining about incomplete pipeline object here.In the current case, the user was able to implement its datetime arithmetic business logic using standard PMML constructs (implemented as transformer classes in the
sklearn2pmml.preprocessing
module). There was no need for using lambda functions at all.