input = pd.DataFrame({
'Timestamp': [
pd.Timestamp('19/01/2022 10:00:00'),
pd.Timestamp('19/01/2022 10:00:00'),
pd.Timestamp('19/01/2022 15:00:00'),
pd.Timestamp('19/01/2022 15:30:00'),
pd.Timestamp('19/01/2022 16:00:00'),
pd.Timestamp('19/01/2022 19:30:00'),
pd.Timestamp('19/01/2022 20:00:00'),
pd.Timestamp('19/01/2022 20:30:00'),
pd.Timestamp('20/01/2022 13:00:00'),
pd.Timestamp('20/01/2022 13:30:00'),
pd.Timestamp('20/01/2022 14:00:00'),
pd.Timestamp('20/01/2022 14:50:00'),
pd.Timestamp('20/01/2022 15:00:00')],
'Name': [
'A', 'B', np.NaN, np.NaN, np.NaN,
'C', np.NaN, np.NaN, np.NaN, np.NaN, np.NaN,
'D', np.NaN]})
I am trying to forward fill multiple rows at once between timestamps but I have not found a quick way to do it. Could you please share your solution?
Each row corresponds to a name entry at a given timestamp. There can be multiple names for the same timestamp. I want to propagate this group of names until the next non-nan value.
I have tried a simple for loop but this is relatively slow (the array will have about 100,000 rows).
Desired output is:
desired_output = pd.DataFrame({
'Timestamp': [
pd.Timestamp('19/01/2022 10:00:00'),
pd.Timestamp('19/01/2022 10:00:00'),
pd.Timestamp('19/01/2022 15:00:00'),
pd.Timestamp('19/01/2022 15:00:00'),
pd.Timestamp('19/01/2022 15:30:00'),
pd.Timestamp('19/01/2022 15:30:00'),
pd.Timestamp('19/01/2022 16:00:00'),
pd.Timestamp('19/01/2022 16:00:00'),
pd.Timestamp('19/01/2022 19:30:00'),
pd.Timestamp('19/01/2022 20:00:00'),
pd.Timestamp('19/01/2022 20:30:00'),
pd.Timestamp('20/01/2022 13:00:00'),
pd.Timestamp('20/01/2022 13:30:00'),
pd.Timestamp('20/01/2022 14:00:00'),
pd.Timestamp('20/01/2022 14:50:00'),
pd.Timestamp('20/01/2022 15:00:00')],
'Name': [
'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B',
'C', 'C', 'C', 'C', 'C', 'C',
'D', 'D']})
Please find my attempt below:
import time
t0 = time.time()
unique_timestamps = input.Timestamp.unique()
new_entries = []
last_valid = None
for ut in unique_timestamps:
val = input[input.Timestamp == ut]['Name'].values
if type(val[0])==float and np.isnan(val[0]) and last_valid is not None:
new_entries.append(pd.DataFrame({'Timestamp': ut,
'Name': last_valid}))
else:
last_valid = input[input.Timestamp == ut]['Name']
output = pd.concat([input, pd.concat(new_entries)]).dropna().sort_values('Timestamp')
t1 = time.time()
print(str(t1-t0) + 's')
You can groupby
Timestamp
and aggName
into a collection, then ffill and explode.I'm not sure how to do this most efficiently, but the simplest way I've found to write it is like this:
This follows your logic on how to detect NaNs, i.e.
pd.notna(s.iat[0])
takes the place ofnp.isnan(val[0])
. There are possibly simpler ways to do it, but I'm not sure if they would work, e.g. evenlambda s: s
, which I'm surprised works the way it does (single elements become scalars while multiple elements become arrays).Result: