The calculation result of .rolling().std() of pandas is strange

33 Views Asked by At

Thank you for checking my question.

I am using .rolling().std() to calculate a column in a data frame.

a=pd.DataFrame([200.0, 200.0, 0.0, np.nan, np.nan, np.nan, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0],columns=["val"])

print(a['val'].rolling(window=5, min_periods=1).std(ddof=0).tolist())

# [0.0,
#  0.0,
#  94.28090415820634,
#  94.28090415820634,
#  94.28090415820634,
#  99.99999999999999,
#  0.0,
#  0.0,
#  0.0,
#  0.0,
#  0.0,
#  0.0,
#  0.0,
#  0.0,
#  0.0]

I think the above calculation results are correct.

But I think the calculation results below are incorrect.

a=pd.DataFrame([200.0, 200.0, 200.0, 0.0, np.nan, np.nan, np.nan, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0],columns=["val"])

print(a['val'].rolling(window=5, min_periods=1).std(ddof=0).tolist())

# [0.0,
#  0.0,
#  0.0,
#  86.60254037844386,
#  86.60254037844386,
#  94.28090415820634,
#  100.00000000000001,
#  2.3360154559928683e-06,
#  2.3360154559928683e-06,
#  1.9073486328125e-06,
#  1.6518123698891422e-06,
#  1.4774258980588596e-06,
#  1.4774258980588596e-06,
#  1.4774258980588596e-06,
#  1.4774258980588596e-06,
#  1.4774258980588596e-06]

I think lines 8 to 16 will be 0, but please let me know if I'm wrong.

Sorry for the elementary question.

I would be happy if someone could teach me.

I tried calculating it in Excel, but the answer for the target part was 0.

1

There are 1 best solutions below

0
TheHungryCub On BEST ANSWER

When all the values in the rolling window are the same, the standard deviation should indeed be zero.

However, due to floating-point precision limitations, very small non-zero values are sometimes observed instead of exact zeros.

So, lines 8 to 16 should theoretically be zero, but the small non-zero values you see are likely due to floating-point arithmetic.