I have 2 data frames consisting of times. I'd like to find the minimum time between ALL df1 times and EACH df2 when df2['Start Time'] - df1['Stop Time'] = dt is positive. For instance:
df1
Stop Time Site 2023-10-17 20:10:00.310 P2 2023-10-17 21:20:00.440 P1 2023-10-17 23:30:00.200 P2 2023-10-18 00:00:00.190 P1 2023-10-18 01:00:00.130 P1 2023-10-18 02:00:00.500 P2 2023-10-18 03:00:00.480 P1 2023-10-18 04:00:00.020 P2 2023-10-18 05:00:00.000 P1 2023-10-18 06:00:00.580 P2
df2
Start Time Site 2023-10-17 16:00:00.190 SMR 2023-10-17 17:05:00.050 SMR 2023-10-17 19:10:00.550 SMR 2023-10-17 21:40:00.530 SMR 2023-10-17 22:21:00.180 SMR 2023-10-18 05:21:00.090 SMR 2023-10-18 09:15:00.360 SMR 2023-10-18 11:54:00.160 SMR
So for this dataset, the FIRST positive differences are for df2: 2023-10-17 21:40:00.530 and df1: 2023-10-17 20:10:00.310 AND 2023-10-17 21:20:00.440. The minimum that I'd like to keep in a new df_best dataframe is between 2023-10-17 21:40:00.530 - 2023-10-17 21:20:00.440 = 20 min along with the site name P1. So the first entry would be:
df_best
diff_min Site 5 P1
The last d2 entry, 2023-10-18 11:54:00.160, would have a min with the last entry in d1... about 5hrs and 54min.
I can do this with a couple for loops, but I bet there's a cool pandas way to do this quickly.
Thx,
You don't need to find all matches, just the closest one in the wanted direction.
For that, use a
merge_asof:Output:
You can further
dropnaif you're only interested indf2rows with a match.