I'm using the sklearn transformer for tsfresh in a pipeline, and want to normalize my timeseries before extracting features. My dataset consists of multiple samples containing multiple timeseries.
def build_timeseries_pipeline():
regressor = ensemble.RandomForestRegressor()
return pipeline.Pipeline(
[
(
"augmenter",
FeatureAugmenter(
column_id=globals.dataset.column_names["subject_id"],
column_sort=globals.dataset.column_names["time"],
default_fc_parameters=globals.dataset.tsfresh_features,
n_jobs=16,
),
),
("regressor", regressor),
]
)
this is my working code, but I want to do something like shown in the code below.
def build_timeseries_pipeline():
regressor = ensemble.RandomForestRegressor()
return pipeline.Pipeline(
[
("normalize", Normalizer()),
(
"augmenter",
FeatureAugmenter(
column_id=globals.dataset.column_names["subject_id"],
column_sort=globals.dataset.column_names["time"],
default_fc_parameters=globals.dataset.tsfresh_features,
n_jobs=16,
),
),
("regressor", regressor),
]
)
Of course this does not work since we only send in the indexes and not the actual data to the pipeline as X. But i want to be able to preprocess each sample individually as a dataframe before extracting features with the FeatureAugmenter.
Is this possible?
Any ideas for a clean solution will be greatly appreciated!