Normalizing timeseries before using tsfresh transformer in sklearn pipeline

236 Views Asked by At

I'm using the sklearn transformer for tsfresh in a pipeline, and want to normalize my timeseries before extracting features. My dataset consists of multiple samples containing multiple timeseries.

def build_timeseries_pipeline():

    regressor = ensemble.RandomForestRegressor()
    return pipeline.Pipeline(
        [
            (
                "augmenter",
                FeatureAugmenter(
                    column_id=globals.dataset.column_names["subject_id"],
                    column_sort=globals.dataset.column_names["time"],
                    default_fc_parameters=globals.dataset.tsfresh_features,
                    n_jobs=16,
                ),
            ),
            ("regressor", regressor),
        ]
    )

this is my working code, but I want to do something like shown in the code below.

def build_timeseries_pipeline():

    regressor = ensemble.RandomForestRegressor()
    return pipeline.Pipeline(
        [ 
            ("normalize", Normalizer()),
            (
                "augmenter",
                FeatureAugmenter(
                    column_id=globals.dataset.column_names["subject_id"],
                    column_sort=globals.dataset.column_names["time"],
                    default_fc_parameters=globals.dataset.tsfresh_features,
                    n_jobs=16,
                ),
            ),
            ("regressor", regressor),
        ]
    )

Of course this does not work since we only send in the indexes and not the actual data to the pipeline as X. But i want to be able to preprocess each sample individually as a dataframe before extracting features with the FeatureAugmenter.

Is this possible?

Any ideas for a clean solution will be greatly appreciated!

0

There are 0 best solutions below