completely new here, I tried looking up my problem but couldn't find anything quite similar!
I'm trying to set up a dataframe that contains the data for a schedule and its activity types. For example, if it's '1' it's a normal activity, and if it's '2' it's canceled, and compare that dataframe to another one to see if there is a date between the start/end date in the first dataframe, and if so, modify it so that it becomes 3 rows instead of 1, having the first Start/End date row until that said holiday, the holiday date row, and then the Start/End date continuing after the holiday.
I have no problem creating a single data frame, however my problem arises when I want to compare another series/data frame and potentially add rows that could be between said StartDate and EndDate.
Example Schedule dataframe
| Start Date | End Date | Activity Type |
|---|---|---|
| 2021-01-01 | 2021-12-31 | 1 |
When compared to the other dataframe
| Holiday Start Date | Holiday End Date |
|---|---|
| 2021-02-14 | 2021-02-14 |
| 2021-07-04 | 2021-07-05 |
Ending up like this:
| Start Date | End Date | Activity Type |
|---|---|---|
| 2021-01-01 | 2021-02-13 | 1 |
| 2021-02-14 | 2021-02-14 | 2 |
| 2021-02-15 | 2021-07-03 | 1 |
| 2021-07-04 | 2021-07-04 | 2 |
| 2021-07-05 | 2021-12-31 | 1 |
Any help is appreciated!
Thanks, S.
To present a more instructive example, I created Schedule as containing multiple rows:
I created Holidays as:
All date columns are of datetime64 type.
A preparatory step is to create an IntervalIndex from Holidays:
To get the result from a single row, create the following function:
To test this function you can call it on a single row, say, the initial row:
To understand fully all details, save a single row of Schedule under a variable:
Then execute each instruction from getActivities and see the intermediate results.
And to get the expected result for all rows, you have to concatenate results of application of this function to each row:
For my test data, the result is:
Fist 5 rows are from row 0 of Schedule, with 2 holiday periods. Last 3 rows are from row 1, with 1 holiday period.
Note that Activity Type is either the original value (for "normal" period) or the original value + 1 (for a holiday period), so Schedule should not contain consecutive values as Activity Type.