How do I get to categorical descriptor in a pandas categorical series?

354 Views Asked by At

I'm fairly familiar with pandas categorical dtype. But, I'm having trouble accessing the nice ordered formatting of the categorical dtype at the bottom of a pandas series frame.

Note: I realize other questions have been asked that just gets the unique names. But, this does not provide formatting of the ordering (Categories (3, object): ['low' < 'medium' < 'high']).

If series is y, I've tried:

y.cat.categories #-> index (but without > ordering) y.cat.categories.to_numpy() --> array y.cat.ordered --> bool

y


Out[288]: 
    0      medium
    1         low
    2      medium
    3        high
    4      medium
            ...  
    437    medium
    438    medium
    439    medium
    440      high
    441       low
    Name: target, Length: 442, dtype: category
    Categories (3, object): ['low' < 'medium' < 'high']    # <<---- Trying to get this info 
                                                           # here programatically!

What I'm trying to get is the last line in the output above.

1

There are 1 best solutions below

3
On

Give df,

scoreDtype = pd.CategoricalDtype(['low', 'medium', 'high'], ordered=True)
df = pd.DataFrame({'score':np.random.choice('low medium high'.split(' '), 50)})

You can get the categories using .cat the category accessor:

df['score'].cat.catories

Output:

Index(['low', 'medium', 'high'], dtype='object')

But, you can get the string representation of this object like this:

df['score'].to_string().rsplit('\n', 1)[-1]

Output:

"Categories (3, object): ['low' < 'medium' < 'high']"