I want to create a chart where predicted values are on the X axis and actual values are on the Y axis, with a scatter plot of points that also has a density plot weighted by volume. The observations also have a "units" variable associated with them. I want to create a density plot or heat map of those points but want to use the volume variable to weight the points for purpose of creating the contours / colors. I have this code below, but when I run it I get a warning indicating that it didn't actually use the weights. It creates the exact chart I want, but doesn't use the weights to create the shapes.
import seaborn as sns
import matplotlib.pyplot as plt
#random data for purposes of post -- but in real world I have an actual dataframe here
X = np.random.rand(100000)
Y = np.random.rand(100000)
units = np.random.rand(100000)
# Combine X, Y, and units into a DataFrame
kde_data = pd.DataFrame({
'X': X,
'Y': Y,
'units': units
})
# Drop rows with NaN values
kde_data.dropna(inplace=True)
# Check if there is sufficient data
if len(kde_data) < 3:
print("Insufficient data to create the density plot.")
else:
# Create a KDE plot
plt.figure(figsize=(10, 8))
sns.kdeplot(data=kde_data[['X', 'Y']], weights=kde_data['units'], fill=True, cmap='viridis')
plt.title('Density Plot with units as Weights')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
and get this warning along with the plot:
The following kwargs were not used by contour: 'weights', 'fill'
I would just duplicate the observations by a factor of the weight, but would run into serious compute resources shortages -- way too many observations and weights are in millions in some cases.