Feature Selection, Outlier Removal, Target Transformer for Dask-ML pipelines

79 Views Asked by At

While FS, OR, TT have well-established components in "classic" scikit-learn pipelines, documentation of dask-ml and RAPIDS totally omits them.

What are the best practices to implement Feature Selection, Outlier Removal, Target Transformer in dask-ml when training on large distributed datasets in production? Are there any existing packages already, covering at least a subset of relevant sklearn functionality and compatible with dask-ml/rapids?

I wasn't able to find anything and I'm wondering why, in my experience these components can be quite important for modelling. Granted, the Target Transformations I can do manually, and I can kind of get away without Outlier Removal for start, but Feature Selection is an absolute must given how many features I have on a cluster.

0

There are 0 best solutions below