I have this dataframe
my_data = {"A" : ['x', 'x', 'x', 'y', 'y', 'y', 'y', 'z', 'z'],
"B" : ['i', 'i', 'j', 'j', 'j', 'k', 'k', 'k', 'k'],
"C" : [0, 1, 1, 2, 0, 3, 3, 3, 0]}
pd.DataFrame(my_data)
A and B are independent groups and C is a category variable, I would like to find the percentage of C by group A and B.
There are so many categories of C, so I try to solve without using dummy variable.
like this
result = {"A" : ['x', 'x', 'y', 'y', 'z'],
"B" : ['i', 'j', 'j', 'k', 'k'],
"0" : [1/2, 0, 1/2, 0, 1/2],
"1" : [1/2, 1, 0, 0, 0],
"2" : [0, 0, 1/2, 0, 0],
"3" : [0, 0, 0, 1, 1/2]}
pd.DataFrame(result)
If there are no overlapping groups (e.g., A = x, B = k), they should not be included in the result table.
I solve this problem for one group, but I stucked in two independent group..
You can use
GroupBy.value_counts
withnormalize=True
, and reshaping:output: