Root Mean Squared Error - Calculation Discrepancies in Python

146 Views Asked by Misha At 02 March 2023 at 23:33

I'm experiencing some discrepancies when comparing different calculations of root mean square error (RMSE). What explains these discrepancies? My guesses are (1) rounding or (2) statistic methodology (e.g., sample vs. population).

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.metrics import mean_squared_error
data = sm.datasets.strikes.load_pandas()
X = data.data['duration']
y = data.data['iprod']
X = add_constant(X)
model = sm.OLS(y, X)
results = model.fit()
a = np.sqrt(results.mse_resid)
b = np.sqrt(np.dot(results.resid, results.resid) / len(results.resid))
c = np.sqrt(np.square(results.resid).mean())
d = np.sqrt(1 - results.rsquared_adj)*y.std()
e = np.sqrt(mean_squared_error(results.fittedvalues, y))
f = np.sqrt( (np.linalg.norm(results.fittedvalues - y)**2)/len(y) )
print("\n a = ", a, "\n b = ", b, "\n c = ", c, "\n d = ", d, "\n e = ", e, "\n f = ", f)

Results

 a =  0.043831898071428385 
 b =  0.043119136780037336 
 c =  0.043119136780037336 
 d =  0.043831898071428385 
 e =  0.043119136780037336 
 f =  0.043119136780037336

Original Q&A

There are 1 best solutions below

Misha On 02 March 2023 at 23:33

The explanation for the discrepancies is based on the adjustment for the number of parameters in the regression model (k).

# Adjusted for k
## From model results that adjust for k 
### results.mse_resid
a = np.sqrt(results.mse_resid)
### results.rsquared_adj
d = np.sqrt(1 - results.rsquared_adj)*y.std()
## From model results that are not adjusted for k, but adjusting for k "manually"
### results.resid & results.params.size
b1 = np.sqrt(np.dot(results.resid, results.resid) / ( len(y) - results.params.size) )
c1 = np.sqrt(np.square(results.resid).mean()*((len(y)/(len(y) - results.params.size))))
### results.fittedvalues & results.params.size
e1 = np.sqrt(mean_squared_error(results.fittedvalues, y) *((len(y)/(len(y) - results.params.size))))
f1 = np.sqrt((np.linalg.norm(results.fittedvalues - y)**2)/(len(y) - results.params.size))

# Not adjusted for k
b = np.sqrt(np.dot(results.resid, results.resid) / len(results.resid))
c = np.sqrt(np.square(results.resid).mean())
e = np.sqrt(mean_squared_error(results.fittedvalues, y) )
f = np.sqrt( (np.linalg.norm(results.fittedvalues - y)**2)/len(y) )

print("Adjusted for k\n a =\t", a, "\n d =\t", d, "\n b1 =\t", b1,
      "\n c1 =\t", c1, "\n e1 =\t", e1, "\n f1 =\t", f1, 
      "\nNot adjusted for k\n b =\t", b, "\n c =\t", c, "\n e =\t", e, "\n f =\t", f)

Results

Adjusted for k
 a =     0.043831898071428385 
 d =     0.043831898071428385 
 b1 =    0.043831898071428385 
 c1 =    0.04383189807142839 
 e1 =    0.04383189807142839 
 f1 =    0.043831898071428385 
Not adjusted for k
 b =     0.043119136780037336 
 c =     0.043119136780037336 
 e =     0.043119136780037336 
 f =     0.043119136780037336

Root Mean Squared Error - Calculation Discrepancies in Python

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in STATSMODELS

Related Questions in LEAST-SQUARES

Related Questions in MEAN-SQUARE-ERROR

Trending Questions

Popular # Hahtags

Popular Questions