Grid search over 2 sets of parameters for Lasso Cross Validation in Time Series Data?

108 Views Asked by At

I am familiar with GridSearchCV in sklearn and using it to conduct a grid search over a set of parameters with non-time series data, but I'm not quite sure how to do this when one of the parameters I want to optimize over is the size of the training window for my LASSO regression. Basically I have market data where each data point corresponds to an hour long interval of market data and I am trying to fit a LASSO model where it trains over some lookback (say previous 60 hours) and forecasts volatility for the next hour. I am trying to optimize both over this lookback window (i.e. training window) and the Lasso regularization penalty.

My current approach is hard code the model to step through time given a training window, forecast the next hour, then shift the training window over an hour and forecast the next hour, etc. I basically grid search over some LASSO penalty parameters with a fixed training window, select the optimal lambda, then do a grid search with this fixed lambda over a set of training window sizes. However, this is very inefficient and might also not get me the best pair of parameters.

What I have so far is this:

scores0 = []
param_search = {'alpha' : np.logspace(-4, 0, 15)}

X = df.iloc[:,  :-1]
Y = df.iloc[:, -1]

btscv = BlockingTimeSeriesSplit(n_splits=200)
for i in range(30):
    model = Lasso()
    
    finder0 = GridSearchCV(
        estimator=model,
        param_grid=param_search,
        scoring='r2',
        n_jobs=4,
        cv=btscv,
        verbose=1,
        pre_dispatch=8,
        error_score=-999,
        return_train_score=True
        )

    finder0.fit(X, Y)

    best_params0 = finder0.best_params_
    best_score0 = round(finder0.best_score_,4)
    scores0.append(best_score0)

However, this only finds optimal alpha for Lasso with the fixed n_splits in my blocked time series split of 200. I want to find the optimal alpha AND n_splits since I am doing a rolling lasso regression as my model.

Anyone have any experience with this without having data leakage in the cross validation? Thanks

0

There are 0 best solutions below