From the following data frame, how can a list of owners with more than two pets be obtained without resorting to copying/pasting or intermediate data structures?
df = pd.DataFrame([['Jack', 'fuzzy',12], ['Jack', 'furry',13], ['Joe', 'scratchy',3], ['Jack', 'chirpy',40], ['Jeff', 'slithery',9], ['Jack', 'swimmy',1], ['Joe', 'feathery',14], ['Joe', 'oinky',11], ['Jack', 'stampy',1]],
columns=['Owner', 'Pet', 'Age'])
print(df)
Owner Pet Age
0 Jack fuzzy 12
1 Jack furry 13
2 Joe scratchy 3
3 Jack chirpy 40
4 Jeff slithery 9
5 Jack swimmy 1
6 Joe feathery 14
7 Joe oinky 11
8 Jack stampy 1
Getting a boolean series matching the condition is easy:
df.groupby('Owner').count()['Pet']>2
Owner
Jack True
Jeff False
Joe True
Name: Pet, dtype: bool
Actually distilling the matches (Jack and Joe) can be done by copying pasting the groupby
statement:
df.groupby('Owner').count()['Pet'][df.groupby('Owner').count()['Pet']>2]
Owner
Jack 5
Joe 3
Name: Pet, dtype: int64
But this is a pain if the conditional statement is long because every change needs to be repeated. The only other way discovered so far is to throw the series back into a data frame and use query()
, but this feel impossibly hackish:
pd.DataFrame(df.groupby('Owner').count()['Pet']).query('Pet > 2')
Pet
Owner
Jack 5
Joe 3
Is there a better way than these?
You could use the
.loc
indexer with a filtering function.Alternatively, you could use the
compress
method.