Problems with label names of a groupby.value_counts() object

143 Views Asked by At

first of all I apologize for my english and thanks for your time.

I´ve a problem with the labels from df or series to draw a catplot with seaborn.

I have a df like this (from a data that which was modified with pd.melt)

    cardio  variable    value
0   0   cholesterol 0
1   1   cholesterol 1
2   1   cholesterol 1
3   1   cholesterol 0
4   0   cholesterol 0
... ... ... ...
419995  0   overweight  1
419996  1   overweight  1
419997  1   overweight  1
419998  1   overweight  1
419999  0   overweight  0

And i need to draw a sns.catplot with that data grouped by 'cardio' and 'variable', and then counted by value. So, I wrote this code:

df_cat = df_cat.groupby(['cardio','variable']).value_counts()
df_cat2=df_cat.to_frame()

The problem is that its returns a df with 2 levels of labels (the top label its the '0') like this:

                                0
cardio  variable    value   
0       active         1    28643
                       0     6378
        alco           0    33080
                       1     1941
        cholesterol    0    29330
                       1     5691

As sns.catplot needs to use dataframe and correctly recognize column names, this '0' column is causing problems to create the catplot. I need to rename the columns names and remove this '0' label from the last df or name the counts column when i use groupby.value_counts() in the first df because I think that the '0' its created automatically since the 'counts' column has no name.

I expect something like this:

cardio  variable    value   count
0       active         1    28643
                       0    6378
        alco           0    33080
                       1    1941
        cholesterol    0    29330
                       1    5691
2

There are 2 best solutions below

1
On BEST ANSWER

Value_counts returns a series with a multi-index. Just reset the index and rename the column--fake data added below in example.

import pandas as pd
import numpy as np

n = 50
cats = ['cholestoral', 'active', 'alco']

data = {'cardio': np.random.randint(2, size=n), 
        'variable': np.random.choice(cats, size=n), 
        'value':np.random.randint(2, size=n)}

df = pd.DataFrame.from_dict(data)

df_plot = (df
    .value_counts(subset=['cardio','variable', 'value'])
    .reset_index()
    .rename(columns={0:'counts'})
)
5
On

pandas' groupby is sql-like if you want to fill the 'na' for catplot's input, I would just directly use sql output as the catplot input, please import sql module and try=)

!pip install pandasql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
query="""
select variable,value,count(value)
from df
group by variable, value
"""
df1=pysqldf(query)
import seaborn as sns
sns.catplot(data=df1,x='variable',y='count(value)',col='value',kind='bar')