I have a pandas dataframe like this:
ID Value
0 a 2
1 a 4
2 b 6
3 c 8
4 c 10
5 c 12
I would like to sample equally from the ID groups. I know I can group the data frame by ID and then specify the number of rows I want to sample from each group like this:
df.groupby("ID").sample(n=2, replace = True)
However, I just want the probability of sampling from a group to be the same, not necessarily the exact same number of rows.
If you want to sample
Nrows with about the same probability to sample each group, you could oversample per group then sample again:Example output:
With
N = 10:Proportion with
N = 100: