Sort categorical x-axis in a seaborn plot

11.3k Views Asked by At

I am trying to plot the top 30 percent values in a data frame using a seaborn scatter plot as shown below.

enter image description here

The reproducible code for the same plot:

import seaborn as sns

df = sns.load_dataset('iris')

#function to return top 30 percent values in a dataframe.
def extract_top(df):
    n = int(0.3*len(df))
    top = df.sort_values('sepal_length', ascending = False).head(n)

    return top

#storing the top values
top = extract_top(df)

#plotting
sns.scatterplot(data = top,
                x='species', y='sepal_length', 
                color = 'black',
                s = 100,
                marker = 'x',)

Here, I want sort the x-axis in order = ['virginica','setosa','versicolor']. When I tried to use order as one of the parameter in sns.scatterplot(), it returned an error AttributeError: 'PathCollection' object has no property 'order'. What is the right way to do it?

Please note: In the dataframe, setosa is also a category in species, however, in the top 30% values non of its value is falling. Hence, that label is not shown in the example output from the reproducible code at the top. But I want even that label in the x-axis as well in the given order as shown below:

enter image description here

3

There are 3 best solutions below

2
On BEST ANSWER

scatterplot() is not the correct tool for the job. Since you have a categorical axis you want to use stripplot() and not scatterplot(). See the difference between relational and categorical plots here https://seaborn.pydata.org/api.html

sns.stripplot(data = top,
              x='species', y='sepal_length', 
              order = ['virginica','setosa','versicolor'],
              color = 'black', jitter=False)

enter image description here

0
On

For those wanting to make use of the extra arguments available in sns.scatterplot over sns.strpplot (size and style mappings for variables), it's possible to set the order of the x axis simply by sorting the dataframe before passing it to seaborn. The following will sort alphabetically.

df.sort_values(feature)
4
On

This means sns.scatterplot() does not take order as one of its args. For species setosa, you can use alpha to hide the scatter points while keep the ticks.

import seaborn as sns

df = sns.load_dataset('iris')

#function to return top 30 percent values in a dataframe.
def extract_top(df):
    n = int(0.3*len(df))
    top = df.sort_values('sepal_length', ascending = False).head(n)

    return top

#storing the top values
top = extract_top(df)
top.append(top.iloc[0,:])
top.iloc[-1,-1] = 'setosa'
order = ['virginica','setosa','versicolor']

#plotting
for species in order:
    alpha = 1 if species != 'setosa' else 0
    sns.scatterplot(x="species", y="sepal_length",
                    data=top[top['species']==species],
                    alpha=alpha,
                    marker='x',color='k')

the output is

output