Sorry for the noob question...here's my code:
from __future__ import division
import sklearn
import numpy as np
from scipy import stats
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
X =np.array([6,8,10,14,18])
Y = np.array([7,9,13,17.5,18])
X = np.reshape(X,(1,5))
Y = np.reshape(Y,(1,5))
print X
print Y
plt.figure()
plt.title('Pizza Price as a function of Pizza Diameter')
plt.xlabel('Pizza Diameter (Inches)')
plt.ylabel('Pizza Price (Dollars)')
axis = plt.axis([0, 25, 0 ,25])
m, b = np.polyfit(X,Y,1)
plt.grid(True)
plt.plot(X,Y, 'k.')
plt.plot(X, m*X + b, '-')
#plt.show()
#training data
#x= [[6],[8],[10],[14],[18]]
#y= [[7],[9],[13],[17.5],[18]]
# create and fit linear regression model
model = LinearRegression()
model.fit(X,Y)
print 'A 12" pizza should cost $% .2f' % model.predict(19)
#work out cost function, which is residual sum of squares
print 'Residual sum of squares: %.2f' % np.mean((model.predict(x)- y) ** 2)
#work out variance (AKA Mean squared error)
xMean = np.mean(x)
print 'Variance is: %.2f' %np.var([x], ddof=1)
#work out covariance (this is whether the x axis data and y axis data correlate with eachother)
#When a and b are 1-dimensional sequences, numpy.cov(x,y)[0][1] calculates covariance
print 'Covariance is: %.2f' %np.cov(X, Y, ddof = 1)[0][1]
#test the model on new test data, printing the r squared coefficient
X_test = [[8], [9], [11], [16], [12]]
y_test = [[11], [8.5], [15], [18], [11]]
print 'R squared for model on test data is: %.2f' %model.score(X_test,y_test)
Basically, some of these functions work for the variables I have called X and Y and some don't.
For example, as the code is, it throws up this error:
TypeError: expected 1D vector for x
for the line
m, b = np.polyfit(X,Y,1)
However, when I comment out the two lines reshaping the variables like this:
#X = np.reshape(X,(1,5))
#Y = np.reshape(Y,(1,5))
I get the error:
ValueError: Found input variables with inconsistent numbers of samples: [1, 5]
on the line
model.fit(X,Y)
So, how do I get the array to work for all the functions in my script, without having different arrays of the same data with slightly different structures?
Thanks for your help!
Change these lines
or just removed them both