TSFRESH - features extracted by a symmetric sliding window

1.7k Views Asked by At

As raw data we have measurements m_{i,j}, measured every 30 seconds (i=0, 30, 60, 90,...720,..) for every subject j in the dataset.

I wish use TSFRESH (package) to extract time-series features, such that for a point of interest at time i, features are calculated based on symmetric rolling window.

We wish to calculate the feature vector of time point i,j based on measurements of 3 hours of context before i and 3 hours after i. Thus, the 721-dim feature vector represents a point of interest surrounded by 6 hours “context”, i.e. 360 measurements before and 360 measurements after the point of interest. For every point of interest, features should be extracted based on 721 measurements of m_{i,j}.

I've tried using rolling_direction param in roll_time_series(), but the only options are either roll backwards or forwards in “time” - I'm looking for a way to include both "past" and "future" data in features calculation.

2

There are 2 best solutions below

0
On BEST ANSWER

A "workaround" solution:

Use the "roll_time_series" function twice; one for "backward" rolling (setting rolling_direction=1) and the second for "forward" (rolling_direction=-1), and then combine them into one.

This will provide, for each time point in the original dataset m_{i,j}$, a time series rolling object with 360 values "from the past" and 360 values "from the future" (i.e., the time point is at the center of the window and max_timeshift=360)

Note to the use of pandas functions below: concat(), sort_values(), drop_duplicates() - which are mandatory for this solution to work.

import numpy as np
import pandas as pd
from tsfresh.utilities.dataframe_functions import roll_time_series
from tsfresh.feature_extraction import EfficientFCParameters, MinimalFCParameters

rolled_backward = roll_time_series(activity_data,
                                           column_id=id_column,
                                           column_sort=sort_column,
                                           column_kind=None,
                                           rolling_direction=1,
                                           max_timeshift=360)

rolled_farward = roll_time_series(activity_data,
                                           column_id=id_column,
                                           column_sort=sort_column,
                                           column_kind=None,
                                           rolling_direction=-1,
                                           max_timeshift=360)

        # merge into one dataframe, with rolled_farward and rolled_backward window for every time point (sample)
        df = pd.concat([rolled_backward, rolled_farward])

        # important! - sort and drop duplicates
        df.sort_values(by=[id_column, sort_column], inplace=True)
        df.drop_duplicates(subset=[id_column, sort_column, 'activity'], inplace=True, keep='first')
0
On

If I understand your idea correctly, it is even possible to do this with only one-sided rolling. Let's try with one example:

You want to predict for the time 8:00 - and you need for this the data from 5:00 until 11:00. If you roll through the data with a size of 6h and positive rolling direction, you will end up with a dataset, which also includes exactly this part of the data (5:00 to 11:00). Normally, it would be used to train for the value at 11:00 (or 12:00) - but nothing prevents you to use it for predicting the value at 8:00. Basically, it is just a matter of re-indexing.

(Same is actually true for negative rolling direction)