Can I use xgboost global model properly, if I skip step_dummy(all_nominal_predictors(), one_hot = TRUE)?

178 Views Asked by At

I wanted to try xgboost global model from: https://business-science.github.io/modeltime/articles/modeling-panel-data.html

On smaller scale it works fine( Like wmt data-7 departments,7ids), but what if I would like to run it on 200 000 time series (ids)? It means step dummy creates another 200k columns & pc can't handle it.(pc can't handle even 14k ids)

I tried to remove step_dummy, but then I end up with xgboost forecasting same values for all ids.

My question is: How can I forecast 200k time series with global xgboost model and be able to forecast proper values for each one of the 200k ids. Or is it necessary to put there step_ dummy in oder to create proper FC for all ids?

Ps:code should be the same as one in the link. Only in my dataset there are 50 monthly observations for each id.

2

There are 2 best solutions below

0
On

I think that there is no proper answer to the question "how it would be possible to forecast 200k ts" properly. Global Models are the way to go here, but you need to experiment to find out, which models do not belong inside the global forecast model.

There will be a threshold, determined mostly by the length of the series, that you put inside the global model. Keep in mind to use several global models, with different feature recipes.

If you want to avoid step_dummy function, use lightgbm from the bonsai package, which is considerably faster and more accurate.

0
On

For this model, the data must be given to xgboost in the format of a sparse matrix. That means that there should not be any non-numeric columns in the data prior to the conversion (with tidymodels does under the hood at the last minute).

The traditional method for converting a qualitative predictor into a quantitative one is to use dummy variables. There are a lot of other choices though. You can use an effect encoding, feature hashing, or others too.