I'm plotting two distributions as histplots, and would like to visualize the difference between them. The distributions are rather similar:
The code I am using to generate one of these plots looks like this:
sns.histplot(
data=dfs_downvoted_percentages["only_pro"],
ax=axes[0],
x="percentage_downvoted",
bins=30,
stat="percent",
)
My supervisor suggested plotting the difference between the normalized distributions, basically displaying the subtraction of one plot form the other. The end result should be a plot where some bins go below 0 (if the bins in plot 2 are larger than in plot 1). Thus, similarities between the plots are erased and differences highlighted.
- Does this make sense? The plots are part of a paper which will hopefully be published; I haven't seen such a plot before, but as he explained it, it makes sense to me. Are there better ways to visualize what I want to express? I already have another plot where I filter out all values with x=0, so that the other ones become more visible.
- Is there an easy way to achieve this utilizing seaborn?
If not: I know how I can normalize the data and calculate percentage for each bin by hand. But what I couldn't find is a kind of plot that consists of bins and offers the possibility to have negative bins. I know how I could create a lineplot with 30 data points showing the calculated difference, but I'd rather have it visually similar to the original plots with bins instead of a line. What kind of plot could I use for that?
np.histogram
, which returnshist
andbin_edges
.bin_edges
must be used for both function calls.hist
of each dataframe, and plot it againstbin_edges
.h_diff
as a bar plot.bin_edge
than there are bars, so select all but the last value,bin_edges[:-1]
, for the x-axis labels passed tox=
.sns.barplot
are 0-indexed, so reset the ticks with an extra tick, off-set them by-0.5
, and relabel the ticks with all thebin_edges
.