When using extract_relevant_features of tsfresh, I am getting this error message:

features_filtered_direct = extract_relevant_features(df.notnull(),
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\convenience\relevant_extraction.py", line 172, in extract_relevant_features
    raise ValueError(
ValueError: The following ids are in the time series container but are missing in y: {True}
(ts-env)

Here is the dataframe printed just before the error:

0   2023-01-04       0      3           2   64.0      27      14      24      35      36      16
1   2023-01-08       1      3           4   68.0      52      04      14      25      65      21
2   2023-01-11       0      3           2   84.0      71      72      94      15      66       2
3   2023-01-15       1      3           2   93.0      90      11      31      13      74      11
4   2023-01-18       0      3           3   95.0      30      52      03      45      07       9
..         ...     ...    ...         ...          ...     ...     ...     ...     ...     ...     ...

I don't exactly understand the meaning of the error message. So, I don't know how to resolve it.

UPDATE 07/31/2023

When df.notnull() is replaced with df, here is the error message:

Traceback (most recent call last):
  File "C:\Users\username\anaconda3\envs\ts-env\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 43, in _function_with_partly_reduce
    results = list(itertools.chain.from_iterable(results))
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 42, in <genexpr> 
    results = (map_function(chunk, **kwargs) for chunk in chunk_list)
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 386, in _do_extraction_on_chunk
    return list(_f())
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 372, in _f
    result = [("", func(x))]
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py", line 1704, in sample_entropy
    if np.isnan(x).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:/Users/username/OneDrive/Desktop/Projects/ScoreCalculator/score.py", line 141, in <module>
    main()
  File "c:/Users/username/OneDrive/Desktop/Projects/ScoreCalculator/score.py", line 101, in main
    ml_modelling_classical(data_alternative_l)
  File "c:\Users\username\OneDrive\Desktop\Projects\ScoreCalculator\utilities.py", line 1173, in ml_modelling_classical
    features_filtered_direct = extract_relevant_features(df,
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\convenience\relevant_extraction.py", line 182, in extract_relevant_features
    X_ext = extract_features(
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 164, in extract_features
    result = _do_extraction(
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 294, in _do_extraction
    result = distributor.map_reduce(
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 241, in map_reduce
    result = list(itertools.chain.from_iterable(result))
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tqdm\std.py", line 1178, in __iter__
    for obj in iterable:
  File "C:\Users\username\anaconda3\envs\ts-env\lib\multiprocessing\pool.py", line 868, in next
    raise value
  File "C:\Users\username\anaconda3\envs\ts-env\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 43, in _function_with_partly_reduce
    results = list(itertools.chain.from_iterable(results))
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 42, in <genexpr>
    results = (map_function(chunk, **kwargs) for chunk in chunk_list)
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 386, in _do_extraction_on_chunk
    return list(_f())
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 372, in _f
    result = [("", func(x))]
  File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py", line 1704, in sample_entropy
    if np.isnan(x).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
1

There are 1 best solutions below

5
On

From the information you've provided, the issue seems to lie in this line of code:

features_filtered_direct = extract_relevant_features(df.notnull(),

The df.notnull() returns a DataFrame where each cell is either True (when the original cell's value was not null) or False (when the original cell's value was null).

The error you are facing - ValueError: The following ids are in the time series container but are missing in y: {True} - means that tsfresh is not finding the ID True in your y target array, which is the result of the transformation you performed using df.notnull(). Essentially, tsfresh is treating the boolean values as IDs for the time series, which I assume is not your intention.

Based on your explanation, I suspect that you might have intended to use df.dropna() instead of df.notnull(). The df.dropna() function would remove any rows from the DataFrame which contain null values, ensuring that your data is clean before it's passed to the extract_relevant_features function.

Try replacing df.notnull() with df.dropna() and see if the error still occurs.


PS:

You write

Here is the dataframe printed just before the error:

But this is not correct. Within the function call of extract_relevant_features you transform the DataFrame with df.notnull(). For better debugging you might want to store and print the transformed DataFrame in a separate variable, so that you can print/inspect what you acutally pass to the function.