Export spark feature transformation pipeline to a file

498 Views Asked by Gowrav At 19 November 2018 at 17:41

PMML, Mleap, PFA currently only support row based transformations. None of them support frame based transformations like aggregates or groupby or join. What is the recommended way to export a spark pipeline consisting of these operations.

Original Q&A

There are 2 best solutions below

user1808924 On 26 November 2018 at 10:27

PMML and PFA are standards for representing machine learning models, not data processing pipelines. A machine learning model takes in a data record, performs some computation on it, and emits an output data record. So by definition, you are working with a single isolated data record, not a collection/frame/matrix of data records.

If you need to represent complete data processing pipelines (where the ML model is just part of the workflow) then you need to look for other/combined standards. Perhaps SQL paired with PMML would be a good choice. The idea is that you want to perform data aggregation outside of the ML model, not inside it (eg. a SQL database will be much better at it than any PMML or PFA runtime).

Elmar Macek On 06 February 2019 at 13:17

I see 2 options wrt Mleap:

1) implement dataframe based transformers and the SQLTransformer-Mleap equivalent. This solution seems to be conceptually the best (since you can always encapsule such transformations in a pipeline element) but also alot of work tbh. See https://github.com/combust/mleap/issues/126

2) extend the DefaultMleapFrame with the respective operations, you want to perform and then actually apply the required actions to the data handed to the restserver within a modified MleapServing subproject.

I actually went with 2) and added implode, explode and join as methods to the DefaultMleapFrame and also a HashIndexedMleapFrame that allows for fast joins. I did not implement groupby and agg, but in Scala this is relatively easy to accomplish.

Export spark feature transformation pipeline to a file

There are 2 best solutions below

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in PMML

Related Questions in MLEAP

Trending Questions

Popular # Hahtags

Popular Questions