TypeError when adding regression line per group with seaborn in python

82 Views Asked by At

I've been having issues with plotting a regression line for two groups using seaborn.

Here is a test df:

data = [['2020-06-01', '521732A', 1, 195.0, 0.0], 
    ['2020-06-02', '521732A', 34, 250.0, 0.01],
    ['2020-06-03', '521732A', 55, 180.0, 0.0],
    ['2020-06-05', '521732B', 5, 195.0, 0.02],
    ['2020-06-01', '521732B', 1, 195.0, 0.0],
    ['2020-06-01', '521732B', 44, 260.0, 0.03]]

df = pd.DataFrame(data, columns=['date', 'product_id', 'clicks', 'price', 'cvr'])

When I plot a normal relplot there are no issues:

sns.relplot(data=df, x=df['price'], y=df['cvr'], hue=df['product_id'], aspect=1.5)

However when I try to use lmplot I get a TypeError and I can't figure out why. I tried converting product_id from object to category but I get the same error. Here is the code I used for the plot:

sns.lmplot(x=df['price'],
           y=df['cvr'],
           hue=df['product_id'],
           data=df)

And the error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/qb/ncs79_z94w91r9f__qrwtdfh0000gn/T/ipykernel_1459/879869642.py in <module>
----> 1 sns.lmplot(x=df['price'],
      2                y=df['cvr'],
      3                hue=df['product_id'],
      4                data=df)

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)
     47     return inner_f
     48 

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/seaborn/regression.py in lmplot(x, y, data, hue, col, row, palette, col_wrap, height, aspect, markers, sharex, sharey, hue_order, col_order, row_order, legend, legend_out, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, seed, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, x_jitter, y_jitter, scatter_kws, line_kws, facet_kws, size)
    602     # Reduce the dataframe to only needed columns
    603     need_cols = [x, y, hue, col, row, units, x_partial, y_partial]
--> 604     cols = np.unique([a for a in need_cols if a is not None]).tolist()
    605     data = data[cols]
    606 

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/overrides.py in unique(*args, **kwargs)

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
    270     ar = np.asanyarray(ar)
    271     if axis is None:
--> 272         ret = _unique1d(ar, return_index, return_inverse, return_counts)
    273         return _unpack_tuple(ret)
    274 

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
    331         aux = ar[perm]
    332     else:
--> 333         ar.sort()
    334         aux = ar
    335     mask = np.empty(aux.shape, dtype=np.bool_)

TypeError: '<' not supported between instances of 'str' and 'float'

1

There are 1 best solutions below

0
On

The right way for plotting with seaborn is usually to have x/y as Input variables; these should be column names in data..

See seaborn.lmplot docs

sns.lmplot(
    x='price',
    y='cvr',
    hue='product_id',
    data=df
)

It's working for seaborn.relplot because there are some inconsistencies in seaborn plotting functions. Some accept the data and the key names of the dataframe, others only allow dataframe key names.