Sklearn2pmml doesn't seem to support custom feature conversion functions?

394 Views Asked by At

My pipeline uses a custom conversion function, it cannot be successfully converted using sklearn2pmml.

Here is my custom function code

def calc_modify_days(X):
    X['modify_date_new']  = X['modify_date'].apply(lambda x:x[:4]+'-'+x[4:6]+'-'+x[6:8] if x!='' and x<'20221230' else '2022-12-30' )
    X['modify_days'] = (pd.to_datetime(X['day_id']) - pd.to_datetime(X['modify_date_new'])).dt.days
    X['modify_days'] = X['modify_days'].apply(lambda x:-1 if x<0 else x)
    
    return X['modify_days']

def transform_channel_ty_cd(X):
    
    return X.apply(lambda x: all_cate_dict['channel_type_cd_3'].get(x) if x in all_cate_dict['channel_type_cd_3'] else 0)

Below is the pipeline code, which works properly for prediction

mapper_encode = [
    (['day_id','modify_date'],FunctionTransformer(calc_modify_days),{'alias':'modify_days'}),
    ('channel_type_cd_3',FunctionTransformer(transform_channel_ty_cd))]

mapper = DataFrameMapper(mapper_encode, input_df=True, df_out=True)

pipeline_test = PMMLPipeline(
    steps=[("mapper", mapper),
           ("classifier", clf_1)])

But when I try to convert the pipeline to a pmml file, I get an error

Standard output is empty
Standard error:
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 61 ms.
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Converting..
Oct 27, 2022 3:43:25 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Java class net.razorvine.pickle.objects.ClassDictConstructor)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
    at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:92)
    at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:63)
    at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:43)
    at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
    at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
    at sklearn.Initializer.encodeFeatures(Initializer.java:44)
    at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
    at sklearn.Composite.encodeFeatures(Composite.java:129)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
    at org.jpmml.sklearn.Main.run(Main.java:228)
    at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to numpy.core.UFunc
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    ... 12 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Java class net.razorvine.pickle.objects.ClassDictConstructor)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
    at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:92)
    at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:63)
    at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:43)
    at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
    at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
    at sklearn.Initializer.encodeFeatures(Initializer.java:44)
    at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
    at sklearn.Composite.encodeFeatures(Composite.java:129)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
    at org.jpmml.sklearn.Main.run(Main.java:228)
    at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to numpy.core.UFunc
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    ... 12 more

I tried to look it up and the FunctionTransformer and lambda functions seem to be the problem。

How should I solve it?

I tried to convert pipeline to pkl.z file first and then to pmml file, but similar error occurred.

In addition, I tried to remove the lambda function, but it still doesn't work, not as long as it's a custom feature handler.

1

There are 1 best solutions below

0
On

This question has been answered in jpmml/sklearn2pmml#354

In short, the inability to pickle FunctionTransformer instances that contain lambda functions (or reference local functions) is a Python limitation. The SkLearn2PMML package is simply complaining about incomplete pipeline object here.

In the current case, the user was able to implement its datetime arithmetic business logic using standard PMML constructs (implemented as transformer classes in the sklearn2pmml.preprocessing module). There was no need for using lambda functions at all.