When using extract_relevant_features
of tsfresh
, I am getting this error message:
features_filtered_direct = extract_relevant_features(df.notnull(),
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\convenience\relevant_extraction.py", line 172, in extract_relevant_features
raise ValueError(
ValueError: The following ids are in the time series container but are missing in y: {True}
(ts-env)
Here is the dataframe printed just before the error:
0 2023-01-04 0 3 2 64.0 27 14 24 35 36 16
1 2023-01-08 1 3 4 68.0 52 04 14 25 65 21
2 2023-01-11 0 3 2 84.0 71 72 94 15 66 2
3 2023-01-15 1 3 2 93.0 90 11 31 13 74 11
4 2023-01-18 0 3 3 95.0 30 52 03 45 07 9
.. ... ... ... ... ... ... ... ... ... ... ...
I don't exactly understand the meaning of the error message. So, I don't know how to resolve it.
UPDATE 07/31/2023
When df.notnull()
is replaced with df
, here is the error message:
Traceback (most recent call last):
File "C:\Users\username\anaconda3\envs\ts-env\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 43, in _function_with_partly_reduce
results = list(itertools.chain.from_iterable(results))
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 42, in <genexpr>
results = (map_function(chunk, **kwargs) for chunk in chunk_list)
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 386, in _do_extraction_on_chunk
return list(_f())
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 372, in _f
result = [("", func(x))]
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py", line 1704, in sample_entropy
if np.isnan(x).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:/Users/username/OneDrive/Desktop/Projects/ScoreCalculator/score.py", line 141, in <module>
main()
File "c:/Users/username/OneDrive/Desktop/Projects/ScoreCalculator/score.py", line 101, in main
ml_modelling_classical(data_alternative_l)
File "c:\Users\username\OneDrive\Desktop\Projects\ScoreCalculator\utilities.py", line 1173, in ml_modelling_classical
features_filtered_direct = extract_relevant_features(df,
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\convenience\relevant_extraction.py", line 182, in extract_relevant_features
X_ext = extract_features(
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 164, in extract_features
result = _do_extraction(
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 294, in _do_extraction
result = distributor.map_reduce(
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 241, in map_reduce
result = list(itertools.chain.from_iterable(result))
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tqdm\std.py", line 1178, in __iter__
for obj in iterable:
File "C:\Users\username\anaconda3\envs\ts-env\lib\multiprocessing\pool.py", line 868, in next
raise value
File "C:\Users\username\anaconda3\envs\ts-env\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 43, in _function_with_partly_reduce
results = list(itertools.chain.from_iterable(results))
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\utilities\distribution.py", line 42, in <genexpr>
results = (map_function(chunk, **kwargs) for chunk in chunk_list)
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 386, in _do_extraction_on_chunk
return list(_f())
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 372, in _f
result = [("", func(x))]
File "C:\Users\username\anaconda3\envs\ts-env\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py", line 1704, in sample_entropy
if np.isnan(x).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
From the information you've provided, the issue seems to lie in this line of code:
The
df.notnull()
returns aDataFrame
where each cell is eitherTrue
(when the original cell's value was not null) orFalse
(when the original cell's value was null).The error you are facing -
ValueError: The following ids are in the time series container but are missing in y: {True}
- means thattsfresh
is not finding the IDTrue
in youry
target array, which is the result of the transformation you performed usingdf.notnull()
. Essentially,tsfresh
is treating the boolean values as IDs for the time series, which I assume is not your intention.Based on your explanation, I suspect that you might have intended to use
df.dropna()
instead ofdf.notnull()
. Thedf.dropna()
function would remove any rows from theDataFrame
which contain null values, ensuring that your data is clean before it's passed to theextract_relevant_features
function.Try replacing
df.notnull()
withdf.dropna()
and see if the error still occurs.PS:
You write
But this is not correct. Within the function call of
extract_relevant_features
you transform theDataFrame
withdf.notnull()
. For better debugging you might want to store and print the transformed DataFrame in a separate variable, so that you can print/inspect what you acutally pass to the function.