Meaning of seaborn.pairplot output

63 Views Asked by At

In a lecture I saw the following example of DB with few features like PCR_01 and PCR_03 and spread (which is y and can be 1 or -1).

Giving the following code:

g = sns.pairplot(train[['PCR_01', 'PCR_03', 'spread']], plot_kws={"s": 12}, hue='spread', palette={-1: 'red', 1: 'green'})
for ax in np.ravel(g.axes):
    ax.grid(alpha=0.5)
g.fig.set_size_inches(12, 8)

we get 4 graphs:

enter image description here

Yet I can't fully understand them.

  1. Shouldn't top right and bottom left graphs be exactly the same except green dots are red and vice versa?

  2. What do the top left and bottom right graphs mean? For example in top left we have PCR_01 in both X and Y axis so I was expecting something like y=x function.

1

There are 1 best solutions below

0
JohanC On

The diagonal plots are kdeplots. They approximate a probability distribution function as a sum of narrow gaussians. The range of values is shown in the horizontal direction, their "density" in the vertical direction (but the density isn't shown on the y-axis labels).

Gaussians smoothen out their input and can't represent a straight cut-off. Probably, the values in your plot are close to uniformly distributed (which is also visible in the scatter plots). When hue is used, the kdes are scaled down according to the number of entries with that hue. PCR_01 has about the same number of entries for both types of spread, while PCR_03 has a bit less of green ones.

The off-diagonal plots are indeed mirror images (x and y-axis interchanged). They can be useful to track a feature, comparing it to many other features. There are options to leave out the upper right (corner=True), or functions to map other kinds of visualization.

A pair plot is often used to quickly get a grasp of a new dataset you are confronted with. One gets indications about distribution of values, weird outliers, expected and unexpected correlations, ... It is a starting point to dive deeper.

The documentation gives a high-level overview of different visualization options. There are also nice introduction videos, e.g. by Kimberley Fessel or Andy McDonald.