Predicting intercept and coefficient for linear regression model for multiple variable

1.6k Views Asked by At

I have the following equation:

P = B0 + B1*Var1 + B2*Var2

I have the values of P, Var1 and Var2 with me. I tried to model this and then calculate the coefficients and intercept.

Below is the code and the output I am getting:

P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power

l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]


X = []
for index in range(0,len(P)):
    row = []
    row.append(t[index])
    row.append(l[index])
    X.append(row)

print "Using statsmodels"
import statsmodels.api as sm

X = sm.add_constant(X)
est = sm.OLS(P, X).fit()

print est.params[0]
print est.params[1]
print est.params[2]

I am getting the results as:

Using statsmodels
1048.32518503
0.0102496334198
0.0860026475829

Is this correct? Does est.params[0] refers to B0 of the equation? I get B0 in the range of 600-650 when I run experiments?

Can this data mismatch because of wrong data ?

1

There are 1 best solutions below

2
On BEST ANSWER

I am not familiar with statsmodels, but here is an implementation using curve_fit (see the code below). The reason for the mismatch of the model prediction and the experimental result you observe is in my opinion that your model (B0 + B1*Var1 + B2*Var2) does not describe the data well (an exponential/log/sqrt would probably be better). In the next plots I show the original data, the fit obtained by curve_fit (code below) and the fit using your parameters.

enter image description here enter image description here

As you can see, the both fitting functions give the same result, however, your data should be modelled by another function, in my opinion. If I find time, I will look for function that fits your data better.

from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power
l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]

# your model
def func(x, b0, b1, b2):

    var1, var2 = x

    return b0 + np.dot(b1, var1) + np.dot(b2, var2)

# Curve fit
coeff, _ = curve_fit(func, (l, t), P)
b0, b1, b2 = coeff[0], coeff[1], coeff[2]
print b0, b1, b2

# plot the data
xval = range(1 ,len(P)+1)
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l,t), *coeff), s=30, marker = "v", color="red", label='curvefit')
plt.legend(loc='upper left')
plt.figure()
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l, t), 1048.32518503, 0.0860026475829, 0.0102496334198 ), s=30, marker = "v",color="black",label='your parameter')
plt.legend(loc='upper left')
plt.show()
print "residuals curve_fit:",((P - func((l,t), *coeff))**2).sum()
print "residuals stats:",((P - func((l,t), 1048.32518503,0.086002647582,0.0102496334198))**2).sum()