autogluon: Detected time series with length <= 2 in data

153 Views Asked by At

I am using autogluon, and I have created my train_data from my dataframe.

train_data = TimeSeriesDataFrame.from_data_frame(
    df,
    id_column="index",
    timestamp_column="timestamp"
)
train_data.head()
value
item_id timestamp   
0   2021-10-13  1409083.0
1   2021-10-14  1416055.0
2   2021-10-15  1223615.0
3   2021-10-16  1333072.0
4   2021-10-17  1284866.0

When I run

predictor = TimeSeriesPredictor(
    prediction_length=48,
    path="autogluon-m4-hourly",
    target="target",
    eval_metric="MASE",
    ignore_time_index=True
)

I get

ValueError: Detected time series with length <= 2 in data.
 Please remove them from the dataset.

Any idea how to fix this?

Edit

My original DF looked like

df.head()
date    total
0   2021-10-13  1409083.0
1   2021-10-14  1416055.0
2   2021-10-15  1223615.0
3   2021-10-16  1333072.0
4   2021-10-17  1284866.0

which was read from a csv file into the dataframe.

I have one value (total) for each day. The totals are what I want to forecast into the future. I have 730 rows in my data (2 years worth).

Is it possible to use autogluon to forecast this data?

I created the item_id column and added it to the df so that I could pass this to the TimeSeriesPredictor

2

There are 2 best solutions below

0
On

I found my issue. And it was in the way that I created the item_id. I assumed (wrongly!) that each row should have an increasing counter for the item_id. Since I had 730 rows, this told autogluon that I was trying to forecast 730 different items. In reality, I have one item with 730 values. When I changed the item_id to all 1's for each row, then the code worked.

df.head()
date    total   item_id
0   2021-10-13  1409083.0   1
1   2021-10-14  1416055.0   1
2   2021-10-15  1223615.0   1
3   2021-10-16  1333072.0   1
4   2021-10-17  1284866.0   1
1
On

If any item in the df has less than 2 timestamps, it will throw this error (time series with length <= 2 make frequency inference impossible).

There are only unique item indices in your data.

You can (a) construct a TimeSeriesDataFrame without multi-index:

item_id  timestamp          target
0        0 2019-01-01       0
1        0 2019-01-02       1
2        0 2019-01-03       2
3        1 2019-01-01       3
4        1 2019-01-02       4
5        1 2019-01-03       5
6        2 2019-01-01       6
7        2 2019-01-02       7
8        2 2019-01-03       8

or (b) format with multi-index on item_id and timestamp:

item_id timestamp        target
0       2019-01-01       0
        2019-01-02       1
        2019-01-03       2
1       2019-01-01       3
        2019-01-02       4
        2019-01-03       5
2       2019-01-01       6
        2019-01-02       7
        2019-01-03       8

If you're reading a CSV or parquet, you'd need to do the former (without multi-index). Documentation