kdeplot produces unexpected results

Question

kdeplot produces unexpected results

1.1k Views Asked by Eddy-Python At 21 October 2024 at 03:38

I created a simple seaborn kde plots and wonder whether this is a bug.

My code is:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

sns.kdeplot(np.array([1,2]), cmap="Reds",  shade=True,  bw=0.01)
sns.kdeplot(np.array([2.4,2.5]), cmap="Blues", shade=True,  bw=0.01)
plt.show()

The blue and red lines show the kde's of 2 points. If the points are close together, the densities are much narrower compared to the points being further apart. I find this very counter intuitive, at least to the extent that can be seen. I am wondering whether this might be a bug. I also could not find a resource describing how the densities are computed from a set of given points. Any help is appreciated.

Original Q&A

There are 1 best solutions below

**JohanC** · Accepted Answer

The bw_method= (called bw= in older versions), is directly passed to scipy.stats.gaussian_kde. The docs there write "If a scalar, this will be used directly as kde.factor". The explanation of kde.factor tells "The square of kde.factor multiplies the covariance matrix of the data in the kde estimation." So, it is a kind of scaling factor. If still more details are needed, you could dive into scipy's source code, or into the research papers referenced in the docs.

If you really want to counter the scaling, you could divide it away: sns.kdeplot(np.array(data), ..., bw_method=0.01/np.std(data)).

Or you could create your own version of a gaussian kde, with a bandwidth in data coordinates. It just sums some gauss curves and normalizes (total area under the curve should be 1) via dividing by the number of curves.

Here is some example code, with kde curves for 1, 2 or 20 input points:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def gauss(x, mu=0.0, sigma=1.0):
    return np.exp(-((x - mu) / sigma) ** 2 / 2) / (sigma * np.sqrt(2 * np.pi))

def kde(xs, data, sigma=1.0):
    return gauss(xs.reshape(-1, 1), data.reshape(1, -1), sigma).sum(axis=1) / len(data)

sns.set()
sigma = 0.03
xs = np.linspace(0, 4, 300)
fig, ax = plt.subplots(figsize=(12, 5))

data1 = np.array([1, 2])
kde1 = kde(xs, data1, sigma=sigma)
ax.plot(xs, kde1, color='crimson', label=f'dist of 1, σ={sigma}')
ax.fill_between(xs, kde1, color='crimson', alpha=0.3)

data2 = np.array([2.4, 2.5])
kde2 = kde(xs, data2, sigma=sigma)
ax.plot(xs, kde2, color='dodgerblue', label=f'dist of 0.1, σ={sigma}')
ax.fill_between(xs, kde2, color='dodgerblue', alpha=0.3)

data3 = np.array([3])
kde3 = kde(xs, data3, sigma=sigma)
ax.plot(xs, kde3, color='limegreen', label=f'1 point, σ={sigma}')
ax.fill_between(xs, kde3, color='limegreen', alpha=0.3)

data4 = np.random.normal(0.01, 0.1, 20).cumsum() + 1.1
kde4 = kde(xs, data4, sigma=sigma)
ax.plot(xs, kde4, color='purple', label=f'20 points, σ={sigma}')
ax.fill_between(xs, kde4, color='purple', alpha=0.3)

ax.margins(x=0)  # remove superfluous whitespace left and right
ax.set_ylim(ymin=0)  # let the plot "sit" onto y=0
ax.legend()
plt.show()

kdeplot produces unexpected results

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PLOT

Related Questions in SEABORN

Related Questions in KDEPLOT

Trending Questions

Popular # Hahtags

Popular Questions