I have a dataframe. I would like to extract features based on a time window.
df = pd.DataFrame({'time':[1,2,3,4,5,6,7,8,9,10,2,3,5,6,8,10,12],
'id':[793,793,793,793,793,793,793,793,793,793,942,942,942,942,942,942,942],
'B1':[10,20,30,40,50,60,70,80,90,100,23,24,25,27,30,44,55],
'B2':[10,20,30,40,50,60,70,80,90,100,23,24,25,27,30,44,55],
'B3':[10,20,30,40,50,60,70,80,90,100,23,24,25,27,30,44,55]})
time_window = pd.DataFrame({'time':[2,4,6,8,5,8], 'id':[793,793,793,793,942,942]})
Here, my time window is
[2,4]--> for participant 793 [6,8]--> for participant 793 [5,8]--> for participant 942
My goal is to extract the features on the specified time window for each participant. Therefore, I wrote a function
from tsfresh import extract_features
def apply_tsfresh(col):
for i in range(len(time)):
col.loc[time_window[i]:time_window[i+1]] = extract_features(col.loc[time_window[i]:time_window[i+1]], column_id="id")
return col
extracted_freatures = df.set_index('time').apply(apply_tsfresh)
It will extract the features based on the specified time window for each participant. However, I am not getting any results. It provides me an error.
Could you please help me here? I am totally out of any ideas.
My desired output should be look like as: desired result
*Here, the extracted features maybe more than just two. Also the extracted features values maybe different. I am just giving you an example.
Initially, an empty dataframe is created 'extracted_freatures_'. A cycle is created, step two. Elements are taken from the dataframe 'time_window' column 'time'. The results from 'extract_features' are attached to the 'extract_features' dataframe. Don't ask me how 'tsfresh' works, I don't know.
Output