Gradient Boosting using Python - General Question

165 Views Asked by At

What I want to achieve.

My data is in the following format. Daily Natural Gas price settlements. Column A : individual rows from December 2018 - December 2026 Column B : Opening price of gas from December 2018 - December 2026 Column C : Previous price of gas from December 2018 - December 2026.

I want to use gradient boosting algorithm in Python to predict prices beyond December 2026 but I think typically the output of the algorithm returns an array of some sort after implement D Matrix and subsequent commands and subsequently run few more steps to come up with scatter plot.

Question.

Using the array (generated data) I am lost on what should I do next to predict December 2026 and beyond because my scatter plot might just take training and test data set and make a prediction but what about future years which are of my interest.

1

There are 1 best solutions below

2
On BEST ANSWER

If you don't have the data for years beyond 2026 then you will have no way of knowing how well your models perform for those years (this is tautological.)

I think one thing you can do in that case is weight your train, validate & test splits based on a datetime index of your data. By preventing your model from "seeing the future" in training, you can get a decent idea of how predictable your target is, measuring the model's performance on "future" holdout data after you train. Presumably, as the maintainer of the model you would then update your predictions (and iterate on training) as new years of data become available.

I guess I should also point out that you haven't shared a compelling reason why xgboost and only xgboost will do for this problem. For models that may go into production, I would encourage you to run some regressions or cheaper algorithms and compare performance. If you haven't checked out some of the model selection tools out there, I think it would be worth your while! An easy one to get started with is gridsearch: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html