I've been having issues with plotting a regression line for two groups using seaborn.
Here is a test df:
data = [['2020-06-01', '521732A', 1, 195.0, 0.0],
['2020-06-02', '521732A', 34, 250.0, 0.01],
['2020-06-03', '521732A', 55, 180.0, 0.0],
['2020-06-05', '521732B', 5, 195.0, 0.02],
['2020-06-01', '521732B', 1, 195.0, 0.0],
['2020-06-01', '521732B', 44, 260.0, 0.03]]
df = pd.DataFrame(data, columns=['date', 'product_id', 'clicks', 'price', 'cvr'])
When I plot a normal relplot there are no issues:
sns.relplot(data=df, x=df['price'], y=df['cvr'], hue=df['product_id'], aspect=1.5)
However when I try to use lmplot I get a TypeError and I can't figure out why. I tried converting product_id from object to category but I get the same error. Here is the code I used for the plot:
sns.lmplot(x=df['price'],
y=df['cvr'],
hue=df['product_id'],
data=df)
And the error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/qb/ncs79_z94w91r9f__qrwtdfh0000gn/T/ipykernel_1459/879869642.py in <module>
----> 1 sns.lmplot(x=df['price'],
2 y=df['cvr'],
3 hue=df['product_id'],
4 data=df)
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/seaborn/regression.py in lmplot(x, y, data, hue, col, row, palette, col_wrap, height, aspect, markers, sharex, sharey, hue_order, col_order, row_order, legend, legend_out, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, seed, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, x_jitter, y_jitter, scatter_kws, line_kws, facet_kws, size)
602 # Reduce the dataframe to only needed columns
603 need_cols = [x, y, hue, col, row, units, x_partial, y_partial]
--> 604 cols = np.unique([a for a in need_cols if a is not None]).tolist()
605 data = data[cols]
606
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/overrides.py in unique(*args, **kwargs)
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
270 ar = np.asanyarray(ar)
271 if axis is None:
--> 272 ret = _unique1d(ar, return_index, return_inverse, return_counts)
273 return _unpack_tuple(ret)
274
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
331 aux = ar[perm]
332 else:
--> 333 ar.sort()
334 aux = ar
335 mask = np.empty(aux.shape, dtype=np.bool_)
TypeError: '<' not supported between instances of 'str' and 'float'
The right way for plotting with seaborn is usually to have x/y as
Input variables; these should be column names in data.
.See seaborn.lmplot docs
It's working for seaborn.relplot because there are some inconsistencies in seaborn plotting functions. Some accept the data and the key names of the dataframe, others only allow dataframe key names.