Add weights to density function

44 Views Asked by At

I want to create a chart where predicted values are on the X axis and actual values are on the Y axis, with a scatter plot of points that also has a density plot weighted by volume. The observations also have a "units" variable associated with them. I want to create a density plot or heat map of those points but want to use the volume variable to weight the points for purpose of creating the contours / colors. I have this code below, but when I run it I get a warning indicating that it didn't actually use the weights. It creates the exact chart I want, but doesn't use the weights to create the shapes.


import seaborn as sns
import matplotlib.pyplot as plt
#random data for purposes of post -- but in real world I have an actual dataframe here
X =  np.random.rand(100000)
Y =  np.random.rand(100000)
units =  np.random.rand(100000)

# Combine X, Y, and units into a DataFrame
kde_data = pd.DataFrame({
    'X': X,
    'Y': Y,
    'units': units
})

# Drop rows with NaN values
kde_data.dropna(inplace=True)

# Check if there is sufficient data
if len(kde_data) < 3:
    print("Insufficient data to create the density plot.")
else:
    # Create a KDE plot
    plt.figure(figsize=(10, 8))
    sns.kdeplot(data=kde_data[['X', 'Y']], weights=kde_data['units'], fill=True, cmap='viridis')
    plt.title('Density Plot with units as Weights')
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.show()

and get this warning along with the plot:

The following kwargs were not used by contour: 'weights', 'fill'

I would just duplicate the observations by a factor of the weight, but would run into serious compute resources shortages -- way too many observations and weights are in millions in some cases.

0

There are 0 best solutions below