I have a problem trying to align two different pandas dataframes. Actually the time alignment works using:
import pandas as pd
import datetime
import numpy as np
import matplotlib.pyplot as plt
d1 = np.random.random_integers(0,7000,[4000,1])
d2 = np.random.random_integers(0,7000,[2000,1])
dfA = pd.DataFrame(d1)
dfB = pd.DataFrame(d2)
dfA.columns = ['data1']
dfB.columns = ['data2']
dfA['time'] = pd.date_range('1970-01-01 00:01:00', periods=dfA.shape[0], freq='1S')
dfB['time'] = pd.date_range('1970-01-01 00:00:00', periods=dfB.shape[0], freq='1S')
dfA.set_index('time', inplace=True)
dfB.set_index('time', inplace=True)
dfA1 = dfA.between_time('00:00:00', '00:09:00')
dfA2 = dfA.between_time('00:14:00', '00:16:00')
dfB1 = dfB.between_time('00:00:00', '00:12:00')
dfB2 = dfB.between_time('00:15:00', '00:16:00')
df1 = pd.concat([dfA1, dfA2])
df2 = pd.concat([dfB1, dfB2])
df_aligned = df1.join(df2, how='outer').interpolate(method='time').resample('2S').mean().fillna(method='backfill')
print(df_aligned.head())
df_aligned.plot()
plt.plot(df_aligned['data1'].values)
plt.plot(df_aligned['data2'].values)
plt.show()
However the two columns in df1 and df2 presents different time gaps and, as a result, I have new samples inside this gaps. My task is just retrieve actual data without fake samples coming from the gaps.
Any suggestion? Thank you so much in advance.
I've found a solution:
1) First of all get rid of interpolate() and put a limit=1 in fillna(). This allows long bursts of NaN vaues to remain in the data gaps. Of course you can use your fillna method and custom limit depending on the task.
2) Then, use dropna() to drop all NaN values (i.e. the values inside the time gaps)
Final results: