I have a dataframe of categorical variables in Python Pandas:
import pandas as pd
cat_df = pd.DataFrame({'item':['bed', 'lamp', 'candle', 'chair', 'bed', 'candle', 'lamp'],
'location' :['home', 'home', 'home', 'home', 'home', 'home', 'home' ],
'status' :['new', 'used', 'used', 'new', 'new', 'used', 'new' ]})
cat_df = cat_df.astype('category')
print(cat_df.dtypes)
cat_df.describe().transpose()
and I need an output that shows all the values of categorical variables plus the frequency and percentage
# item bed 2 28.57
# item lamp 2 28.57
# item candle 2 28.57
# item chair 1 14.28
# location home 7 100
# status new 4 57.14
# status used 3 42.85
The real dataset has a lot of variables so I need a scalable solution.
Use
DataFrame.melt
withGroupBy.size
for counts and then divide bysum
to new column byGroupBy.transform
: