Unable to serialize a apache spark transformer in mleap

803 Views Asked by At

I use Spark 2.1.0 and Scala 2.11.8.

I am trying to build a twitter sentiment analysis model in apache spark and service it using MLeap.

When I am running the model without using mleap, things work smoothly. Problem happens only when I try to save the model in mleap's serialization format so I can serve the model later using mleap.

Here is the line with throws the error -

val modelSavePath = "/tmp/sampleapp/model-mleap/" 

val pipelineConfig = json.get("PipelineConfig").get.asInstanceOf[Map[String, Any]]
val loaderConfig = json.get("LoaderConfig").get.asInstanceOf[Map[String, Any]]
val loaderPath = loaderConfig
    .get("DataLocation")
    .get
    .asInstanceOf[String]
var data = sqlContext.read.format("com.databricks.spark.csv").
                 option("header", "true").
                 option("delimiter", "\t").
                 option("inferSchema", "true").
                 load(loaderPath)

val pipeline = Pipeline(pipelineConfig)

val model = pipeline.fit(data)
val mleapPipeline: Transformer = model

I get java.util.NoSuchElementException: key not found: org.apache.spark.ml.feature.Tokenizer in the last line.

When I did a quick search I found out that mleap does not support all the transformers. But I was not able to find an exhaustive list.

How do I find out if the transformers that I am using are actually not supported or there is some other error.

1

There are 1 best solutions below

3
On BEST ANSWER

I am one of the creators of MLeap, and we do support Tokenizer! I am curious, which version of MLeap are you trying to use? I think you may be looking at an outdated codebase from TrueCar, check out our new codebase here:

https://github.com/combust/mleap

We also have fairly complete documentation here, including a full list of supported transformers:

Documentation: http://mleap-docs.combust.ml/

Transformer List: http://mleap-docs.combust.ml/core-concepts/transformers/support.html

I hope this helps, and if things still aren't working, file an issue in github and we can help you debug it from there.