color seaborn swarmplot points with additional metadata beyond hue in boxplot

611 Views Asked by At

Say I have data that I want to box plot and overlay with a swarm plot in seaborn, whose colors of the points add additional information on the data.

Question: How can I get box plots to be close to each other for a given x axis value (as is done in hue) without refactorizing x to the hue value and the x axis value?

For example, here I want to overlay the points to the box plot and want the points further colored by ‘sex’. Example:

plt.figure(figsize = (5, 5))

sns.boxplot(x = 'class', y = 'age', 
            hue = 'embarked', dodge = True, data = df)


sns.swarmplot(x = 'class', y = 'age', 
              dodge = True,
              color = '0.25',
              
              hue = 'sex', data = df)

plt.legend(bbox_to_anchor = (1.5, 1))

EDIT: The idea would be to have something that looks like the 'S' box for 'Third' in the plot (I made a fake example in powerpoint, so hue in both boxplot and swarmplot are the same to overlay the points on the appropriate boxes).

enter image description here

Is there a way to make this plot without first refactorizing the x-axis to ‘first-S’, ‘first-C’, ‘first-Q’, ‘second-S’, etc and then add hue by ’sex’ in both plots?

1

There are 1 best solutions below

3
On

Using original x as col and hue as x

To work with two types of hue, seaborn's alternative is to create a FacetGrid. The original x= then becomes the col= (or the row=), and one of the hues becomes the new x=.

Here is an example. Note that aspect= controls the width of the individual subplots (the width being height*aspect).

from matplotlib import pyplot as plt
import seaborn as sns

df = sns.load_dataset('titanic')
g = sns.catplot(kind='box', x='embarked', y='age', hue='sex', col='class',
                dodge=True, palette='spring',
                height=5, aspect=0.5, data=df)
g.map_dataframe(sns.swarmplot, x='embarked', y='age', hue='sex', palette=['0.25'] * 2, size=2, dodge=True)
for ax in g.axes.flat:
    # use title as x-label
    ax.set_xlabel(ax.get_title())
    ax.set_title('')
    # remove y-axis except for the left-most columns
    if len(ax.get_ylabel()) == 0:
        ax.spines['left'].set_visible(False)
        ax.tick_params(axis='y', left=False)
plt.subplots_adjust(wspace=0)
plt.show()

combining sns.boxplot and sns.swarmplot with two hue variables

Only using hue for the swarmplot, without dodge

Here is a variant, where the boxplot doesn't use hue, but the swarmplot does. A bit more padding can be added inside the subplots, and the boxplots can be made touching via width=1. Suppressing the outliers of the boxplot looks cleaner, as they would overlap with the outlier of the swarmplot.

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset('titanic')
df['embarked'] = pd.Categorical(df['embarked'], ['S', 'C', 'Q'])  # force a strict order
g = sns.catplot(kind='box', x='embarked', y='age', col='class',
                dodge=True, palette='summer', width=1, showfliers=False,
                height=5, aspect=0.5, data=df)
g.map_dataframe(sns.swarmplot, x='embarked', y='age', hue='sex', palette=['b', 'r'], size=2, dodge=False)
g.add_legend()
for ax in g.axes.flat:
    # use title as x-label
    ax.set_xlabel(ax.get_title())
    ax.set_title('')
    # remove y-axis except for the left-most columns
    if len(ax.get_ylabel()) == 0:
        ax.spines['left'].set_visible(False)
        ax.tick_params(axis='y', left=False)
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmin - 0.2, xmax + 0.2)  # add a bit more spacing between the groups
plt.subplots_adjust(wspace=0)
plt.show()

catplot using hue for swarmplot