I would like to know if someone could help me solving this issue I'm facing.
First of all: I'm using Visual Studio Code.
Pandas, matplotlib (that maybe is not even needed), statsmodels, numpy and sklearn were all installed with the code pip install *, with * being one of the various libraries.
I have a .csv file from which I am taking an X matrix and a Y vector. I'm using a backward elimination algorithm to evaluate the multiple linear regression that exits between my variables (each column of X) and the results in Y. Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.regression.linear_model as sm
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
regr = linear_model.LinearRegression()
dataset = pd.read_csv('file.CSV', sep=";")
features = list(dataset.iloc[:, 0:-1].columns)
features= ["const"]+features
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:,-1]
X = np.append(arr = np.ones((X.shape[0], 1)), values = X, axis = 1)
def backwardElimination(x, y):
numVars = len(x[0])
temp = np.zeros((len(x[0]),len(x[1]))).astype(int)
for i in range(0, numVars):
regressor_OLS = sm.OLS(y, x).fit()
maxVar = max(regressor_OLS.pvalues).astype(float)
adjR_before = regressor_OLS.rsquared_adj.astype(float)
if maxVar > SL:
for j in range(0, numVars - i):
if (regressor_OLS.pvalues[j].astype(float) == maxVar):
temp[:,j] = x[:, j]
x = np.delete(x, j, 1)
tmp_regressor = sm.OLS(y, x).fit()
adjR_after = tmp_regressor.rsquared_adj.astype(float)
if (adjR_before >= adjR_after):
x_rollback = np.hstack((x, temp[:,[0,j]]))
x_rollback = np.delete(x_rollback, j, 1)
print (regressor_OLS.summary())
return x_rollback
else:
continue
regressor_OLS.summary()
return x
SL = 0.005
X_opt = X
X_Modeled = backwardElimination(X_opt, SL)
What I know is that when I use print(X_opt), print(y) and print(features) everything works fine (all the values are correctly loaded and visualized).
But when I run my code this error pops up ( with * for file location root):
File "*\multiple_linear_regression.py", line 76, in <module>
X_Modeled = backwardElimination(X_opt, SL)
File "*\multiple_linear_regression.py", line 49, in backwardElimination
regressor_OLS = sm.OLS(y, x).fit()
File "*\linear_model.py", line 892, in __init__
super(OLS, self).__init__(endog, exog, missing=missing,
File "*\Programs\Python\Python39\lib\site-packages\statsmodels\regression\linear_model.py", line 713, in __init__
weights = np.repeat(weights, len(endog))
TypeError: object of type 'float' has no len()
Can someone help me please? I can't find a solution anywhere.
Thank you!