I have a pandas data frame with a column where a condition is met based on other elements in the data frame (not shown). Additionally, I have a column that extends the validness one row further with the following rule:
If a valid row is followed directly by ExtendsValid, that row is also valid, even if the underlying valid condition doesnt apply. Continue filling valid forward as long as ExtendsValid is 1
I have illustrated the result in column "FinalValid" (desired result. Doesnt have to be a new column, can also fill Valid forward). Note that rows 8 and 9 in the example also become valid. Also note that row 13 does NOT result in FinalValid, because you need a preceding valid row. Preceding valid row can be Valid or an extended valid row.
So far if I had a problem like that I did a cumbersome multi-step process:
- Create a new column for when "Valid" or "ExtendsValid" is true
- Create a new column marking the start point for each "sub-series" (a consecutive set of ones)
- Number each sub-series
- fillna using "group by" for each sub series
I can provide sample code but I am really looking for a totally different, more efficient approach, which of course must be non-iterating as well.
Any ideas would be welcome.
# | Valid | ExtendsValid | FinalValid |
---|---|---|---|
1 | 0 | 0 | 0 |
2 | 1 | 0 | 1 |
3 | 0 | 0 | 0 |
4 | 0 | 0 | 0 |
5 | 1 | 0 | 1 |
6 | 0 | 0 | 0 |
7 | 1 | 0 | 1 |
8 | 0 | 1 | 1 |
9 | 0 | 1 | 1 |
10 | 0 | 0 | 0 |
11 | 1 | 0 | 1 |
12 | 0 | 0 | 0 |
13 | 0 | 1 | 0 |
14 | 0 | 0 | 0 |
IIUC, you want to ffill the 1s only if there is an uninterrupted series of 1s starting on Valid and eventually continuing on ExtendsValid.
For this you can use a
groupby.cummin
:Output:
NB. I slightly modified the input on row 3 to better illustrate the logic.