LassoCV results depend on the number of inputvariables?

29 Views Asked by At

I want to perform variable selection using Lasso regression, as I am not sure how many (lagged) variables X still have an effect on my variable y. However, the resulting model, and also which variables end up being zero, is different for different amounts of inputvariables.

For example, I have n=295 observations. If I use LassoCV on 10 lagged input variables, I get that the 5th lag is 0. If I only use 8 lagged input variables, the 4th lag might turn out 0. So my variable selection is dependent on the number of variables I start with, and I therefore think the result can't be trusted. What am I doing wrong?

Since in my application n>p, I don't think it has to do with multiple minima in the Lasso criterion. I do get ConvergenceWarnings very often, so I increased my number of iterations and tolerance. I am not very familiar with the duality gap, it does seem very large. Maybe the error lies here?

ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 31069.631120879843, tolerance: 12362.796494481028 model = cd_fast.enet_coordinate_descent_gram(

My code in python:

import numpy as np
from sklearn.linear_model import LassoCV
from sklearn.model_selection import TimeSeriesSplit

n_lambs = 50
    cv = TimeSeriesSplit(
        n_splits=5,
        gap=0
    )

model = LassoCV(
        alphas=np.logspace(-4, 2, n_lambs),
        fit_intercept=True,
        cv=cv,
        n_jobs=-1,
        max_iter=1000000,
        tol=0.001
        )

fit = model.fit(X_train, y_train)

In this code, I vary my X_train variable to have different amounts of lags.

(By the way, any suggestions on different variable selection methods would be appreciated too.)

0

There are 0 best solutions below