This ValueError occurs during the implementation of powerdiscrepancy test form statsmodels.stats.gof
Below is a function for printing out results of this test. However, statsmodels throws error on the different sizes of sets even if they are actually the same.
def discrepancy_test(Y,
result):
"""
Calculate discrepancy test for the variables of the classification problem
Parameters
----------
Y : np.array
Array of true labels
result : Result instance
Fitted classification model
"""
from statsmodels.stats.gof import powerdiscrepancy
print(Y.shape, result.predict().shape)
dtest = powerdiscrepancy(Y, result.predict())
pd.DataFrame(dtest, columns = ['variables', 'p_value'])
discrepancy_test(Y, result_pr)
(140,) (140,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[120], line 1
----> 1 discrepancy_test(Y, result_pr)
Cell In[119], line 15, in discrepancy_test(Y, result)
3 """
4 Calculate discrepancy test for the variables of the classification problem
5
(...)
12
13 """
14 print(Y.shape, result.predict().shape)
---> 15 dtest = powerdiscrepancy(result.predict(), Y)
16 pd.DataFrame(dtest, columns = ['variables', 'p_value'])
File c:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels\stats\gof.py:153, in powerdiscrepancy(observed, expected, lambd, axis, ddof)
151 e = nt * e
152 else:
--> 153 raise ValueError('observed and expected need to have the same '
154 'number of observations, or e needs to add to 1')
155 k = o.shape[axis]
156 if e.shape[axis] != k:
ValueError: observed and expected need to have the same number of observations, or e needs to add to 1