LinearRegression on linerrud dataset

1.4k Views Asked by At

I am trying to find performance and Mean squared error of linnerud dataset with linear regression technique. I am stuck while passing data and get error "ValueError: Found input variables with inconsistent numbers of samples: [10, 1]". Linnerud dataset has three features and three columns in target where I only want to use one feature which is chinup. Can someone help me in fixing at the point I am stuck?

Following is what I have tried so far, by referring https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

from sklearn import datasets
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import numpy as np

linnerud = datasets.load_linnerud()
print(linnerud)

# Use only one feature
linnerud_X = linnerud.data[:, np.newaxis, 0]
print(linnerud_X)
X = np.array(linnerud_X).reshape((1,-1))
print(X)
# Split the data into training/testing sets
linnerud_X_train = linnerud_X[:-10]
linnerud_X_test = linnerud_X[-10:]
#print(linnerud_X_train)
#print(linnerud_X_test)


Y = np.array(linnerud.target).reshape((1,-1))

# Split the targets into training/testing sets
linnerud_y_train = Y
#linnerud_y_test #= Y[-10:]
print(linnerud_y_train)
#print(linnerud_y_test)

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(linnerud_X_train, linnerud_y_train)

# Make predictions using the testing set
linnerud_y_pred = regr.predict(linnerud_X_test)

I am expecting similar results what is been achieved in the following example, https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

1

There are 1 best solutions below

2
On

The number of entries in the dependent and the independent variable is not that same.

>>> linnerud_y_train.shape
(1, 60)
>>> linnerud_X_train.shape
(10, 1)

Also, the reshapes you did on the target are incorrect(I'm not sure what you were trying to do there).

The features were split into train and test, but the split was not done on the target. This was the reason you got the value error.

But a better way to do it would be:

linnerud = datasets.load_linnerud()
linnerud_X = linnerud.data[:, np.newaxis, 0]   # Use only one feature

# Split to train and test
linnerud_X_train = linnerud_X[:10]
linnerud_X_test = linnerud_X[10:]

Y = linnerud.target[: np.newaxis, 0]
linnerud_y_train = Y[:10]
linnerud_y_test = Y[10:]

regr = linear_model.LinearRegression()
regr.fit(linnerud_X_train, linnerud_y_train)