How to get rows that are last in order within pandas without using a loop

25 Views Asked by heretoinfinity At 09 December 2023 at 11:09

I have data that can be grouped by column type and then ordered by another column order. I would like to know if I can use sklearn's train_test_split to split this data such that the rows that have the same value for order and are numerically the last, split out as the test case. In the example below, I would like the last two rows with order=3 to go into the test case.

type	order
A	1
A	1
A	2
A	2
A	3
A	3

The way I can think of doing this is programmatically and appending to a list, dataframe or array as I iterate over the type after selecting these values first from the bigger dataframe that has multiple types. I am wondering if there's an alternate way of using train_test_split or something within pandas that avoids a loop.

EDIT:

I would also like to have the rows in the top with orders 1 and 2 as I need them in training.

Original Q&A

There are 2 best solutions below

Muhammed Yunus On 09 December 2023 at 11:34

Is the solution below suitable? It filters the rows based on whether they are "order == maximum order value" or not.

Data:

import pandas as pd

data = {'type': ['A', 'A', 'A', 'A', 'A', 'A'],
        'order': [1, 1, 2, 2, 3, 3]}

df = pd.DataFrame(data)

Filter rows


top_rows, bottom_rows = [df.loc[rows] for rows
                         in [df.order.ne(df.order.max()), df.order.eq(df.order.max())]
                         ]

display(top_rows, bottom_rows)

Panda Kim On 09 December 2023 at 11:36

Code

cond = df.groupby('type')['order'].transform('last').eq(df['order'])

df[cond]

    type    order
4   A       3
5   A       3

df[~cond]

    type    order
0   A       1
1   A       1
2   A       2
3   A       2

How to get rows that are last in order within pandas without using a loop

There are 2 best solutions below

Related Questions in PANDAS

Related Questions in TRAIN-TEST-SPLIT

Trending Questions

Popular # Hahtags

Popular Questions