I have data that can be grouped by column type and then ordered by another column order. I would like to know if I can use sklearn's train_test_split to split this data such that the rows that have the same value for order and are numerically the last, split out as the test case. In the example below, I would like the last two rows with order=3 to go into the test case.
| type | order |
|---|---|
| A | 1 |
| A | 1 |
| A | 2 |
| A | 2 |
| A | 3 |
| A | 3 |
The way I can think of doing this is programmatically and appending to a list, dataframe or array as I iterate over the type after selecting these values first from the bigger dataframe that has multiple types. I am wondering if there's an alternate way of using train_test_split or something within pandas that avoids a loop.
EDIT:
I would also like to have the rows in the top with orders 1 and 2 as I need them in training.
Is the solution below suitable? It filters the rows based on whether they are "order == maximum order value" or not.
Data:
Filter rows