Calculating the area of a 2D KDE plot

203 Views Asked by At

I have data in the form of a DataFrame with two columns, an x column and a y column, i feed the data into sns.kdeplot in the following way: sns.kdeplot(data = data ,x="x", y="y", fill=True, common_norm=False, alpha=0.7,color=color) and get a plot with a few layers, i want to calculate the area of plot , meaning the outline of the area would be the lowest density since its the largest area and only that layer

I've tried getting the levels from the plot but it returns None instead of giving me a list of numbers (i'm assuming that's the output for a few levels)

example code:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

x = np.linspace(0, 10, 100)
y = x**2

data = {"X": x, "Y": y}

kde_plot =sns.kdeplot(data=data,x="X",y="Y", common_norm=False, color='blue')
levels = kde_plot.collections[0].get_array()
print(levels)

plt.title('2D KDE Plot with Custom Data')
plt.xlabel('X')
plt.ylabel('Y')

plt.show()

if there's a more efficient way to calculate the area i would greatly appreciate the input

1

There are 1 best solutions below

2
On

I extract the outermost contour route, which reflects the outline of the lowest density, and then calculate the area under this curve with SciPy's simps function.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.integrate import simps

# Generate example data
x = np.linspace(0, 10, 100)
y = x**2
data = {"X": x, "Y": y}

# Create the KDE plot
kde_plot = sns.kdeplot(data=data, x="X", y="Y", fill=True, common_norm=False, color='blue')

# Extract the contour levels
contour_collections = kde_plot.collections
outermost_contour = contour_collections[-1]

# Extract the contour paths
contour_paths = outermost_contour.get_paths()

# Extract the vertices of the outermost contour
vertices = contour_paths[0].vertices

# Separate the x and y values
x_values, y_values = vertices[:, 0], vertices[:, 1]

# Calculate the area under the curve using the trapezoidal rule
area_outline = simps(y_values, x=x_values)

print("Area of the outline (lowest density):", area_outline)

plt.title('2D KDE Plot with Custom Data')
plt.xlabel('X')
plt.ylabel('Y')

plt.show()

This should provide a more accurate estimation of the area of the outermost contour.

Output :

enter image description here