I am trying to make a jointplot in Seaborn. The goal is to have a scatter plot of all [x,z] values and to have these color-coded by [cat], and to have the distributions for these two categories. Then I also want a scatter and distribution plot of [x,alt_Z], ignoring the alt_Z values that are NaN.
Using Python 3.7
Here is a stand-alone dataset and my goal (made in Excel, so the distributions are not shown).
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import seaborn as sns
col1 = [1,1.5,3.1,3.4,2,-1]
col2 = [1,-3,2,8,2.5,-1.3]
col3 = [4,3,4,0.5,1,0.3]
col4 = [10,12,10,'NaN',13,'NaN']
col5 = ['A','A','A','B','A','B']
df = pd.DataFrame(list(zip(col1, col2, col3, col4, col5)),
columns =['x', 'y', 'z', 'alt_Z', 'cat'])
display(df)
The code below doesn't finish the plot and returns TypeError: The y variable is categorical, but one of ['numeric', 'datetime'] is required
. I also don't how, in the code below, to group by [cat] A & B, so it is shown as red and only the A category is plotting.
df2 = df[['x', 'y', 'z', 'alt_Z', 'cat']]\
.melt(id_vars=['x', 'y'], value_vars=['z', 'alt_Z'])
g = sns.jointplot(data=df2, x='x', y='value', hue='variable',
palette={'z': 'black', 'alt_Z': 'red'})
One problem with the dataframe, is that
col4
contains integers and 'NaN'. As there don't exist NaN values for integers, pandas makes it a column of objects. Converting it to floats will create a proper float column withNaN
as numbers.To create the scatter plot, two calls to
sns.scatter()
will do:From here, we can create 2 dataframes:
df1
containingx
,z
andcat
. Anddf2
containingx
andalt_Z
. Renamingalt_Z
toz
and filling in acat
column containing the stringalt_Z
will make it similar todf1
.The
jointplot()
can then operate on the concatenation of both datafames: