Creating Spark ML object we just need to know:
- The type of model
- The parameters for the model
I am just brainstorming a way to pass this information using json
and instantiate a Spark ML object from it.
For example, with this json
{
"model": RandomForestClassifier,
"numTrees": 10,
"featuresCol": "binaryFeatures"
}
It will instantiate a Random Forest model.
val rf = new RandomForestClassifier().setNumTrees(10).setFeaturesCol("binaryFeatures")
It is fairly straightforward to write a custom json
serializer/deserializer by my own. Scala's pattern match
seems good use case to dynamically instantiate an object from the name in string. However, when the object gets more complex (i.e. supporting pipeline), it is hard to maintain the custom serializer.
Is there any existing implementation for this? If not, what should the json
structure look like?