Transformer's Op name isn't available when setting opName

142 Views Asked by At

I created my custom transformer (simple model that adds a string to a column value) to test Mleap serialization, but while writing my Op file for Mleap and Spark serialization, I couldn't my transformer's name.

My reference.conf file looks like this

my.domain.mleap.spark.ops = ["spark_side.CustomTransformerOp"]

// include the custom transformers ops we have defined to the default Spark registries
ml.combust.mleap.spark.registry.v20.ops += my.domain.mleap.spark.ops
ml.combust.mleap.spark.registry.v21.ops += my.domain.mleap.spark.ops
ml.combust.mleap.spark.registry.v22.ops += my.domain.mleap.spark.ops
ml.combust.mleap.spark.registry.v23.ops += my.domain.mleap.spark.ops

my.domain.mleap.ops = ["mleap_side.CustomTransformerOp"]

// include the custom transformers we have defined to the default MLeap registry
ml.combust.mleap.registry.default.ops += my.domain.mleap.ops

When I run the pipeline with only that stage on my dataset it works fine, I'm even able to save the pipeline if I set opName to some string or one of the Bundle.BuiltinOps members.

If I put in some string, error pops up that says: "unable to find key : thatString", and if I use another member the error states that it's unable to find a key from that member (which is completely reasonable and I understand why it happens).

My question is how do I make the name of my transformer available when declaring opName in my Op files.

(if somebody could hit up Hollin Wilkins that would be amazing :D)

1

There are 1 best solutions below

0
On

I had the same question. according to this link

https://github.com/combust/mleap/wiki/Adding-an-MLeap-Spark-Transformer

you'll need to add it yourself to ml.combust.bundle.dsl.Bundle.BuiltinOps

In Section 3. Implement Bundle.ML serialization for MLeap

Note: if implementing a vanilla Spark transformer, make sure to add the opName to ml.combust.bundle.dsl.Bundle.BuiltinOps.