Python: Linear Regression, reshaping numpy arrays for use in model

5.7k Views Asked by At

Sorry for the noob question...here's my code:

from __future__ import division
import sklearn
import numpy as np
from scipy import stats 
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

X =np.array([6,8,10,14,18])
Y = np.array([7,9,13,17.5,18])
X = np.reshape(X,(1,5))
Y = np.reshape(Y,(1,5))

print X
print Y

plt.figure()
plt.title('Pizza Price as a function of Pizza Diameter')
plt.xlabel('Pizza Diameter (Inches)')
plt.ylabel('Pizza Price (Dollars)')
axis = plt.axis([0, 25, 0 ,25])
m, b = np.polyfit(X,Y,1)
plt.grid(True)
plt.plot(X,Y, 'k.')
plt.plot(X, m*X + b, '-')

#plt.show()


#training data
#x= [[6],[8],[10],[14],[18]]
#y= [[7],[9],[13],[17.5],[18]]

# create and fit linear regression model
model = LinearRegression()
model.fit(X,Y)
print 'A 12" pizza should cost $% .2f' % model.predict(19)

#work out cost function, which is residual sum of squares
print 'Residual sum of squares: %.2f' % np.mean((model.predict(x)- y) ** 2)

#work out variance (AKA Mean squared error)
xMean = np.mean(x)
print 'Variance is: %.2f' %np.var([x], ddof=1)

#work out covariance (this is whether the x axis data and y axis data correlate with eachother)
#When a and b are 1-dimensional sequences, numpy.cov(x,y)[0][1] calculates covariance
print 'Covariance is: %.2f' %np.cov(X, Y, ddof = 1)[0][1]


#test the model on new test data, printing the r squared coefficient
X_test = [[8], [9], [11], [16], [12]]
y_test = [[11], [8.5], [15], [18], [11]]
print 'R squared for model on test data is: %.2f' %model.score(X_test,y_test)

Basically, some of these functions work for the variables I have called X and Y and some don't.

For example, as the code is, it throws up this error:

TypeError: expected 1D vector for x 

for the line

m, b = np.polyfit(X,Y,1)

However, when I comment out the two lines reshaping the variables like this:

#X = np.reshape(X,(1,5))
#Y = np.reshape(Y,(1,5))

I get the error:

ValueError: Found input variables with inconsistent numbers of samples: [1, 5]

on the line

model.fit(X,Y)

So, how do I get the array to work for all the functions in my script, without having different arrays of the same data with slightly different structures?

Thanks for your help!

1

There are 1 best solutions below

5
On BEST ANSWER

Change these lines

X = np.reshape(X,(5))
Y = np.reshape(Y,(5))

or just removed them bothenter image description here