Seaborn stripplot variable marker sizes not working with categories

694 Views Asked by At

I am trying to plot a stripplot with 3 categories (column assigned to x) and have the marker sizes vary based on a column in a dataframe.

However, the sizes don't line up even when I am setting the sizes attribute to the same as y (I am using sizes = df["col"] as if I use sizes = "col" I get the error TypeError: len() of unsized object). In doing this, I'd expect to see smaller markers at the bottom and larger markers at the top as the values should be the same for both sizes and y. Instead there doesn't appear to be any correlation between the size of the marker and its position on the y-axis.

After some investigation by pulling out the PathCollections and comparing the actual values (.get_offsets() with the size value (.get_sizes), it is clear that the same array of sizes is being used for each category.

Is this feature not properly implemented yet? I tried assigning the categories as hue instead of x but I get a StopIteration error. The only solution I've found is to iterate through each category and plot it on a separate axis in a row of axes. This is clunky and surely there's a better way.

Here is a very simplified version of my code:

sns.stripplot(data = df,
              x = 'category_col',
              y = 'value_col',
              sizes = df['value_col'])
1

There are 1 best solutions below

0
On

The sns.stripplot documentation doesn't mention sizes= as a possible parameter. The function shares some code with sns.swarmplot which relies on all points having the same size. But hue should work without problems, at least in the latest version. (In seaborn, hue comes in two flavors: either a limited set of values interpreted as categories with individual colors, or a numerical range which works with color mapping).

Here is how hue could be used, starting from seaborn's 'tips' dataset:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

tips = sns.load_dataset('tips')
sns.set_style('white')
ax = sns.stripplot(data=tips, x='day', y='total_bill', hue='tip')
sns.despine()
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1.02, 1.02))
plt.tight_layout()
plt.show()

sns.stripplot with hue

A scatterplot doe have size= (column to indicate the size of the dots) and sizes= (to indicate the range of sizes) parameters. Converting the categorical x column to numbers and manually add some jitter, you can create a strip plot.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

tips = sns.load_dataset('tips')
tips['day'] = tips['day'].astype('category')  # make sure the column uses pandas categorical type
# make a numerical column and add some jitter
tips['day_codes'] = tips['day'].cat.codes + np.random.uniform(-0.3, 0.3, len(tips))

sns.set_style('white')
ax = sns.scatterplot(data=tips, x='day_codes', y='total_bill', hue='smoker',
                     size='tip', sizes=(1, 500))
ax.set_xlabel('')
ax.set_xticks(np.arange(len(tips['day'].cat.categories)), tips['day'].cat.categories)
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1.02, 1.02))
sns.despine()
plt.tight_layout()
plt.show()

sns.scatterplot imitating stripplot