Error of "type 'float' has no len() when trying BackwardElimination algorithm in Python

56 Views Asked by At

I would like to know if someone could help me solving this issue I'm facing.

First of all: I'm using Visual Studio Code. Pandas, matplotlib (that maybe is not even needed), statsmodels, numpy and sklearn were all installed with the code pip install *, with * being one of the various libraries.

I have a .csv file from which I am taking an X matrix and a Y vector. I'm using a backward elimination algorithm to evaluate the multiple linear regression that exits between my variables (each column of X) and the results in Y. Here is my code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.regression.linear_model as sm
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
regr = linear_model.LinearRegression()

dataset = pd.read_csv('file.CSV', sep=";") 
features = list(dataset.iloc[:, 0:-1].columns)
features= ["const"]+features
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:,-1]

X = np.append(arr = np.ones((X.shape[0], 1)), values = X, axis = 1)

def backwardElimination(x, y):
    numVars = len(x[0])
    temp = np.zeros((len(x[0]),len(x[1]))).astype(int)
    for i in range(0, numVars):
        regressor_OLS = sm.OLS(y, x).fit()
        maxVar = max(regressor_OLS.pvalues).astype(float)
        adjR_before = regressor_OLS.rsquared_adj.astype(float)
        if maxVar > SL:
            for j in range(0, numVars - i):
                if (regressor_OLS.pvalues[j].astype(float) == maxVar):
                    temp[:,j] = x[:, j]
                    x = np.delete(x, j, 1)
                    tmp_regressor = sm.OLS(y, x).fit()
                    adjR_after = tmp_regressor.rsquared_adj.astype(float)
                    if (adjR_before >= adjR_after):
                        x_rollback = np.hstack((x, temp[:,[0,j]]))
                        x_rollback = np.delete(x_rollback, j, 1)
                        print (regressor_OLS.summary())
                        return x_rollback
                    else:
                        continue
    regressor_OLS.summary()
    return x
                    
SL = 0.005
X_opt = X


X_Modeled = backwardElimination(X_opt, SL)

What I know is that when I use print(X_opt), print(y) and print(features) everything works fine (all the values are correctly loaded and visualized).

But when I run my code this error pops up ( with * for file location root):

 File "*\multiple_linear_regression.py", line 76, in <module>
   X_Modeled = backwardElimination(X_opt, SL)
 File "*\multiple_linear_regression.py", line 49, in backwardElimination
   regressor_OLS = sm.OLS(y, x).fit()
 File "*\linear_model.py", line 892, in __init__
   super(OLS, self).__init__(endog, exog, missing=missing,
 File "*\Programs\Python\Python39\lib\site-packages\statsmodels\regression\linear_model.py", line 713, in __init__
   weights = np.repeat(weights, len(endog))
TypeError: object of type 'float' has no len()

Can someone help me please? I can't find a solution anywhere.

Thank you!

0

There are 0 best solutions below