I have a process I repeat...often, but do not know how to create a dynamic function to make this easier. I need to take a variable (age for example) and bin the age into discrete bins equaling at least 1000 units of another column (weight). I need to do this dynamically for any variable as well as control the weight sums. So if I want to bin in 500 unit or 2000 unit increments, I need that as a changeable parameter.
Here is a sample data set:
import pandas as pd
import numpy as np
age = np.arange(0,30,1)
weight = np.linspace(200, 2500,30)
bin_dict = {'age':age, 'weight':weight}
df = pd.DataFrame(bin_dict)
Now let's say i want to bin age into bins of no less than 1000 units. this is what the result would look like:
df['bin_aged'] = pd.cut(df['age'],
bins = [-np.inf, 3, 5, 7, 9, 11,12,13,14,15,16,17,
18,19,20,21,22,23,24,25,26,27,28,np.inf],
labels = [3, 5, 7, 9, 11,12,13,14,15,16,17,
18,19,20,21,22,23,24,25,26,27,28,29])
This is what it would look like if I grouped by the new binned column:
df.groupby('bin_aged').agg({'weight':'sum'})
Is this possible?
I figured this out....