Apply a function with multiple arguments to a large Pandas dataframe efficiently

47 Views Asked by PeteyPablo At 27 March 2023 at 17:53

My dataframe (1,957,046 x 4) is of baby names by year, count and gender, as follows:

Year	Name	Gender	Count
1880	A	F	1
1880	B	M	5
1880	C	F	2
...	...	...	...
2018	X	M	7
2018	Y	F	4
2018	Z	M	2

I am trying to create a new column called "popularity," defined as babies with a given name and gender per million for a particular year. E.g., if Mary (F) is recorded 50,000 times in 1950 with 2,000,000 babies born that year, then its popularity score will be 25,000 (per million).

This is what I've tried.

def popularity(Year,Name,Gender):
    count = df[(df['Year']==Year) & (df['Name']==Name) & (df['Gender']==Gender)].Count.sum()
    total = df[(df['Year']==Year)].Count.sum()
    return round((count/total)*1e6)

df['popularity'] = df.apply(lambda x: popularity(x.Year,x.Name,x.Gender),axis=1)

I also tried doing 'df.groupby(...)' but that didn't work because the function has multiple arguments. Even if I truncate the data as df[0:100], the function takes 15+ seconds to run, which would equate to about 84 hours of runtime for the entire df.

Any suggestions for economizing would be appreciated.

Original Q&A

Apply a function with multiple arguments to a large Pandas dataframe efficiently

There are 0 best solutions below

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in GROUP-BY

Related Questions in RUNTIME

Related Questions in PANDAS-APPLY

Trending Questions

Popular # Hahtags

Popular Questions