I want to use interpolation to replace some missing values in dataframe. Here is the code to replicate my sample data.
import numpy as np
import pandas as pd
np.random.seed(12345)
# simulate a yield-curve df with dimension 250 (time) x 10 (maturities)
df = pd.DataFrame(np.random.randn(250, 10), columns=np.arange(1, 11), index=pd.date_range('2015-01-01', periods=250, freq='B'))
# set 20% data to NaN
sel = np.random.choice([True, False], size=(250, 10), p=[0.2, 0.8])
df.values[sel] = np.nan
print(df)
1 2 3 ... 8 9 10
2015-01-01 -0.2047 0.4789 NaN ... 0.2817 0.7690 NaN
2015-01-02 1.0072 NaN 0.2750 ... -0.3718 1.6690 -0.4386
2015-01-05 -0.5397 0.4770 NaN ... NaN 0.0009 1.3438
2015-01-06 -0.7135 -0.8312 -2.3702 ... 0.1198 -1.0635 0.3329
2015-01-07 -2.3594 -0.1995 -1.5420 ... -0.7539 0.3313 NaN
... ... ... ... ... ... ... ...
2015-12-10 -1.2566 -0.3499 0.8317 ... -1.3944 -0.8628 1.3491
2015-12-11 0.6566 -0.8118 -0.3346 ... 0.7839 0.0165 NaN
2015-12-14 -0.1559 -2.8151 NaN ... -0.3437 -1.4282 0.9694
2015-12-15 -1.0264 NaN -2.2169 ... 1.2570 1.6857 0.7227
2015-12-16 0.9407 0.5943 0.1862 ... -0.5998 NaN 0.9468
[250 rows x 10 columns]
Interpolation column-wise based on 'time method' works.
# column-wise time series interpolation works nicely
df1 = df.copy()
df1.interpolate(method='time', axis=0, limit=10, inplace=True)
print(df1)
1 2 3 ... 8 9 10
2015-01-01 -0.2047 0.4789 NaN ... 0.2817 0.7690 NaN
2015-01-02 1.0072 0.4785 0.2750 ... -0.3718 1.6690 -0.4386
2015-01-05 -0.5397 0.4770 -1.7089 ... -0.0031 0.0009 1.3438
2015-01-06 -0.7135 -0.8312 -2.3702 ... 0.1198 -1.0635 0.3329
2015-01-07 -2.3594 -0.1995 -1.5420 ... -0.7539 0.3313 -0.1842
... ... ... ... ... ... ... ...
2015-12-10 -1.2566 -0.3499 0.8317 ... -1.3944 -0.8628 1.3491
2015-12-11 0.6566 -0.8118 -0.3346 ... 0.7839 0.0165 1.2542
2015-12-14 -0.1559 -2.8151 -1.7463 ... -0.3437 -1.4282 0.9694
2015-12-15 -1.0264 -1.1104 -2.2169 ... 1.2570 1.6857 0.7227
2015-12-16 0.9407 0.5943 0.1862 ... -0.5998 1.6857 0.9468
[250 rows x 10 columns]
But interpolation row-by-row using cubic spline doesn't work at all?
# row-wise cross-maturity interpolation (cubic spline) doesn't work at all?
df2 = df.copy()
df2.interpolate(method='cubic', axis=1, inplace=True)
print(df2)
1 2 3 ... 8 9 10
2015-01-01 -0.2047 0.4789 NaN ... 0.2817 0.7690 NaN
2015-01-02 1.0072 NaN 0.2750 ... -0.3718 1.6690 -0.4386
2015-01-05 -0.5397 0.4770 NaN ... NaN 0.0009 1.3438
2015-01-06 -0.7135 -0.8312 -2.3702 ... 0.1198 -1.0635 0.3329
2015-01-07 -2.3594 -0.1995 -1.5420 ... -0.7539 0.3313 NaN
... ... ... ... ... ... ... ...
2015-12-10 -1.2566 -0.3499 0.8317 ... -1.3944 -0.8628 1.3491
2015-12-11 0.6566 -0.8118 -0.3346 ... 0.7839 0.0165 NaN
2015-12-14 -0.1559 -2.8151 NaN ... -0.3437 -1.4282 0.9694
2015-12-15 -1.0264 NaN -2.2169 ... 1.2570 1.6857 0.7227
2015-12-16 0.9407 0.5943 0.1862 ... -0.5998 NaN 0.9468
[250 rows x 10 columns]
Comparing df and df2 shows that they are equal, so no changes have been made by interpolation. I am not sure where I get wrong. Any help is highly appreciated.
import pandas.util.testing as pdt
pdt.assert_frame_equal(df, df2)
As JohnE suggests, it seems that the inplace=True
causes the problem for row-by-row interpolation. After dropping it, it now works.
print(df2.interpolate(method='linear', axis=1))
1 2 3 ... 8 9 10
2015-01-01 -0.2047 0.4789 -0.0384 ... 0.2817 0.7690 0.7690
2015-01-02 1.0072 0.6411 0.2750 ... -0.3718 1.6690 -0.4386
2015-01-05 -0.5397 0.4770 -0.2721 ... 0.1518 0.0009 1.3438
2015-01-06 -0.7135 -0.8312 -2.3702 ... 0.1198 -1.0635 0.3329
2015-01-07 -2.3594 -0.1995 -1.5420 ... -0.7539 0.3313 0.3313
... ... ... ... ... ... ... ...
2015-12-10 -1.2566 -0.3499 0.8317 ... -1.3944 -0.8628 1.3491
2015-12-11 0.6566 -0.8118 -0.3346 ... 0.7839 0.0165 0.0165
2015-12-14 -0.1559 -2.8151 -0.1460 ... -0.3437 -1.4282 0.9694
2015-12-15 -1.0264 -1.6216 -2.2169 ... 1.2570 1.6857 0.7227
2015-12-16 0.9407 0.5943 0.1862 ... -0.5998 0.1735 0.9468
[250 rows x 10 columns]