Seaborn PairGrid with swarm plots

1.3k Views Asked by At

I have a dataset (kinda) like this:

f1  f2  f3     value
4   2   3      0.927252
1   3   0      0.153415
0   1   1      0.928820
1   0   4      0.933250
0   4   3      0.397307
...

I want to produce a Seaborn PairGrid with stripplots with jitter or swarmplots for each pair of features f1, f2 and f3, and use value for the hue.

Plots in the diagonals should look something like this:

1D strip plot

Which I created with:

df = ...  # My dataset
sns.stripplot("f1", "f1", "value", data=df, jitter=True,
              palette=sns.light_palette("red", len(df)),
              hue_order=sorted(df["value"])).legend().remove()

And off-diagonal plots would be like this:

2D strip plot

Which, likewise, I made with:

df = ...  # My dataset
sns.stripplot("f1", "f2", "value", data=df, jitter=True,
              palette=sns.light_palette("red", len(df)),
              hue_order=sorted(df["value"])).legend().remove()

What I'm trying, therefore, is:

import seaborn as sns
df = ...  # My dataset
g = sns.PairGrid(df, hue="value", palette=sns.light_palette("red", len(df)),
                 hue_order=sorted(df["value"]), vars=df.columns[:-1])
g.map_diag(lambda x, **kwargs: sns.stripplot(x, x, **kwargs), jitter=True)
g.map_offdiag(sns.stripplot, jitter=True)

However, this is yielding:

Strip plot pair grid

I don't really know what I'm missing here. I can still make the plots my self and put them into my own subplots, but that's the whole point of the pair grid. Are these kinds of plots not supported on a grid for some reason?

1

There are 1 best solutions below

5
On BEST ANSWER

Unlike the name may suggest, the hue parameter does not define a color. It may be better to think of it as something like "further dimension" or similar. While in many cases this further dimension is visualized by color, it is not necessarily true for every plot.

In order to get the desired PairGrid, we may leave the hue out, such that all values are shown.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,5, size=(4**3, 3)), columns=["f1", "f2", "f3"])
df["value"] = np.random.rand(len(df))

g = sns.PairGrid(df, vars=df.columns[:-1])
g.map(sns.stripplot, jitter=True, size=3)

plt.show()

enter image description here

The point here is that the hue of the PairGrid is something completely different than the hue of the stripplot. You may indeed use the hue of the stripplot itself to colorize the points in each individual plot, while the hue of the PairGrid rather divides the dataframe into further categories, one category per hue value; this is unwanted here, because the value column in the dataframe contains a continuous variable and you would end up with as many categories as different values in that column.