Plotting two distributions in the marginal plots of JointGrid

37 Views Asked by At

I'm trying to plot a scatter plot with two marginal distribution plots. I have two datasets and I want to compare the two in a single plot. The problem is that I can plot scatter plots of both datasets but I can't plot the distribution plots of the second dataset in the marginal axes. Thanks in advance.

I have come this far that I can plot scatter and distribution plots of the first dataset, and the scatter plot of the second dataset using the code below.

g = sns.JointGrid(data=data1, x="observed", y="predicted")
g.plot(sns.scatterplot, sns.distplot)
g.ax_joint.scatter(data=data2, x="observed", y="predicted", c='r')

The figure below shows what I get running my code: enter image description here

When I want to add distribution plots of the second dataset by adding the code below to the one above, the distribution plots of the second dataset replaces that of the first one. I want to have both of them for comparison purposes.

g.ax_marg_x.hist(data=data2, x= "observed")
g.ax_marg_y.hist(data=data2, x= "observed", orientation="horizontal")

This is what I get by doing so: enter image description here

2

There are 2 best solutions below

0
On BEST ANSWER

Adding data1 seems to do the trick.

g.ax_marg_x.hist(data=data1, x="observed")
g.ax_marg_x.hist(data=data2, x="observed")
g.ax_marg_y.hist(data=data1, x="observed", orientation="horizontal")
g.ax_marg_y.hist(data=data2, x="observed", orientation="horizontal")

enter image description here

0
On

To compare two datasets, you can concatenate the dataframes. And then use hue.

By default, only a kdeplot is shown in the marginals. Use g.plot_marginals(sns.histplot, kde=True, ...) to have histograms with a kde. Note that sns.distplot has been replaced by sns.histplot in newer Seaborn versions. When hue is used in sns.histplot, the same histogram bins for each subset. If histplot would be called separately on each subset, different bins make it harder to compare

Here is an example.

import seaborn as sns
import pandas as pd
import numpy as np

# create some test data
x1 = np.random.rand(50) ** 2 / 2
y1 = np.random.normal(x1, x1 / 2)
x2 = np.random.rand(90) ** 2 / 2
y2 = np.random.normal(x2, 0.04)
data1 = pd.DataFrame({'observed': x1, 'predicted': y1})
data2 = pd.DataFrame({'observed': x2, 'predicted': y2})

# concatenate both dataframes, with a new column indicating the source
data12 = pd.concat({'data1': data1, 'data2': data2}, names=['source', 'ind']).reset_index()

# jointplot with hue
g = sns.JointGrid(data12, x='observed', y='predicted', hue='source', palette=['dodgerblue', 'crimson'])

# add the central plot
g.plot_joint(sns.scatterplot)

# add the marginal plots
# set common_norm to False to avoid scaling by relative size of each source
g.plot_marginals(sns.histplot, kde=True, stat='density', common_norm=False)

joint plot comparing two datasets