I want to get the list of values from col2 that belong to the same groupId, given corresponding value in col1. Col1 values can belong to multiple groups and in that case only top-most group should be considered (group 2 but not group 3 in my example). Col1 values are always identical within the same groupId.
| groupId | col1 | col2 |
|---|---|---|
| 2 | a | 10 |
| 1 | b | 20 |
| 2 | a | 30 |
| 1 | b | 40 |
| 3 | a | 50 |
| 3 | a | 60 |
| 1 | b | 70 |
My current solution takes over 30s for a df with 2000 rows and 32 values to search for in col1 ('a' in this case):
group_id_groups = df.groupby('groupId')
for group_id, group in group_id_groups:
col2_values = list(group[group['col1'] == 'a']['col2'])
if col2_values:
print(col2_values)
break
result: [10, 30]
The
sortparameter of groupby defaults to true, which means the first group will be the topmost by default. You can change thecol_to_searchtoband get the other answer.Output