pandas dataframe take rows before certain indexes

537 Views Asked by At

I have a dataframe and a list of indexes, and I want to get a new dataframe such that for each index (from the given last), I will take the all the preceding rows that matches in the value of the given column at the index.

      C1 C2 C3
0     1  2  A
1     3  4  A
2     5  4  A
3     7  5  B
4     9  7  C
5     2  3  D
6     1  1  D

The column c3 the indexes (row numbers) 2, 4 , 5 my new dataframe will be:

     C1 C2 C3
0     1  2  A
1     3  4  A
2     5  4  A
4     9  7  C
5     2  3  D

Explanation:

For index 2, rows 0,1,2 were selected because C3 equals in all of them.

For index 4, no preceding row is valid.

And for index 5 also no preceding row is valid, and row 6 is irrelevant because it is not preceding. What is the best way to do so?

1

There are 1 best solutions below

2
On BEST ANSWER

you can make conditions to filter data,if you want just preceding rows match to condition.

ind= 2
col ='C3'
# ".loc[np.arange(ind+1)]" creates indexes till preceding row, so rest of matching conditions can be ignored 
df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()

Out:

   C1   C2  C3
0   1   2   A
1   3   4   A
2   5   4   A

appying on other column

ind= 2
col ='C2'
df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()

Out:

   C1   C2  C3
1   3.0 4.0 A
2   5.0 4.0 A