I have the following dataframe:
A B C
0 1 1 1
1 0 1 0
2 1 1 1
3 1 0 1
4 1 1 0
5 1 1 0
6 0 1 1
7 0 1 0
of which I want to know the start and end index when the values are 1 for 3 or more consecutive values per column. Desired outcome:
Column From To
A 2 5
B 1 3
B 4 7
first I filter out the ones that are not consecutive for 3 or more values
filtered_df = df.copy().apply(filter, threshold=3)
where
def filter(col, threshold=3):
mask = col.groupby((col != col.shift()).cumsum()).transform('count').lt(threshold)
mask &= col.eq(1)
col.update(col.loc[mask].replace(1,0))
return col
filtered_df now look as:
A B C
0 0 1 0
1 0 1 0
2 1 1 0
3 1 0 0
4 1 1 0
5 1 1 0
6 0 1 0
7 0 1 0
If the dataframe would have only one column with zeros and ones the result could be achieved as in How to use pandas to find consecutive same data in time series. However, I am struggeling to do something similar for multiple columns at once.
Use
DataFrame.pipefor apply function for allDataFrame.In first solution get first and last value of consecutive
1per each columns, add output to lists and lastconcat:Or first reshape by
unstackand then apply solution: