I'm trying to use dataframe.interpolate to fill missing data. Here is my test:
from itertools import product
df=pd.DataFrame.from_dict({
1.5 :[np.nan ,91.219 ,np.nan ,np.nan ,102.102 ,np.nan ,np.nan ],
2.0 :[np.nan ,np.nan ,np.nan ,np.nan ,103.711 ,np.nan ,103.031 ],
2.5 :[np.nan ,98.25 ,np.nan ,100.406 ,104.695 ,np.nan ,104.938 ],
3.0 :[np.nan ,101.578 ,np.nan ,102.969 ,104.875 ,np.nan ,105.242 ],
3.5 :[np.nan ,103.859 ,87.93 ,104.531 ,104.906 ,np.nan ,105.32 ],
4.0 :[np.nan ,105.156 ,94.469 ,105.656 ,105.844 ,89.68 ,106.523 ],
4.5 :[94.266 ,106.039 ,96.82 ,106.75 ,103.156 ,93.703 ,107.938 ],
5.0 :[97.336 ,107.953 ,98.602 ,107.906 ,104.25 ,96.547 ,109.703 ],
5.5 :[99.664 ,110.438 ,100.203 ,108.906 ,100.375 ,98.844 ,110.188 ],
6.0 :[101.344 ,112.703 ,101.492 ,108.688 ,102.906 ,100.68 ,110.5 ],
6.5 :[102.313 ,112.078 ,102.266 ,108.813 ,104.5 ,101.875 ,104 ],
7.0 :[102.656 ,114.469 ,102.242 ,108.813 ,np.nan ,102.625 ,109 ],
7.5 :[103.25 ,np.nan ,102.594 ,108.813 ,np.nan ,103.234 ,109 ],
}, orient='index')
df.plot(title='original')
for int_method,int_order in list(product(['spline'],range(1,4)))+[
(x,3) for x in ['nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'polynomial',
'krogh', 'piecewise_polynomial', 'pchip', 'akima', 'cubicspline','from_derivatives','linear',
]
]:
spl=df.interpolate(limit_direction='both',method=int_method,order=int_order)
spl.plot(title=f'{int_method},{int_order}')
It seems only spline can give me the exptrapolation that I need. However, I found it seems to add some unexpected fluctuations:
Can someone helps me to understand what happened and even provide some advice on how to improve(I know "improve" is vague phrase here. I can't find a clear definition for it myself)? Thanks!
It is because pandas.DataFrame.interpolate calls scipy.interpolate internally, and scipy.interpolate would sort the X's (in the graph they are on the Y-axis) before it does the interpolation. Obviously it is not what we intended here, which mingles the momentum at the left side and the right side.