Python, extracting features form time series (TSFRESH package or what can I use?)

1.6k Views Asked by At

I need some help for feature extraction in time series, maybe using the TSFRESH package.

I have circa 5000 CSV files, and each one of them is a single time series (they may differ in length). The CSV-time-series is pretty straight forward:

Example of a CSV-Time-Series file: | Date | Value | | ------ | ----- | | 1/1/1904 01:00:00,000000 | 1,464844E-3 | | 1/1/1904 01:00:01,000000 | 1,953125E-3 | | 1/1/1904 01:00:02,000000 | 4,882813E-4 | | 1/1/1904 01:00:03,000000 | -2,441406E-3 | | 1/1/1904 01:00:04,000000 | -9,765625E-4 | | ... | ... |

Along with these CSV files, I also have a metadata file (in a CSV format), where each row refers to one of those 5000 CSV-time-series, and reports more general information about that time series such as the energy, etc.

Example of the metadata-CSV file: | Path of the CSV-timeseries | Label | Energy | Penetration | Porosity | | ------ | ----- | ------ | ----- | ----- | ----------- | | ... | ... | ... | ... | ... | ... | | ... | ... | ... | ... | ... | ... | | ... | ... | ... | ... | ... | ... |

The most important column is the "Label" one since it reports if a CSV-time-series was labeled as:

  1. Good
  2. Bad

I should also consider the energy, penetration, and porosity columns since those values have a big role in the labeling of the time series. (I already tried a decision tree by looking at only the features, now I would like to analyze the time series to extract knowledge)

I intend to extract features from the time series such that I can understand what are the features that make one time series be labeled as "Good" or "Bad".

How can I do this with TSFRESH? There are other ways to do this?

Could you show me how to do it? Thank you :)

1

There are 1 best solutions below

1
On BEST ANSWER

I'm doing something similar currently and this example jupyter notebook from github helped me.

The basic process is in short:

  1. Bring time series in acceptable format, see the tsfresh documentation for more information
  2. Extract features from time serieses using X = extract_features(...)
  3. Select relevant features using X_filtered = select_features(X, y) with y being your label, good or bad being e.g. 1 and 0.
  4. Put select features into a classifier, also shown in the jupyter notebook.