There is a way to calculate and plot means and standard deviation of multi-column python dataframe?

89 Views Asked by At

I have three pandas dataframes with more columns:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from scipy import stats
import seaborn as sns
data = np.random.randint(0, 1001, size=(1000, 100))
data[data < 0] = 0
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data)
df3 = pd.DataFrame(data)

and I want to plot all the dataframes' columns as unstacked histograms in logscale.

num_bin = 30
figure, axes = plt.subplots(1, 3, sharex=True, figsize=(15,5))
axes[0].set_title('log df1')
axes[1].set_title('log df2')
axes[2].set_title('log df3')
sns.histplot(ax=axes[0],data=df1[df1>0], bins=num_bin, log_scale=True, weights=None, legend=False, palette='viridis', alpha=0.1)
sns.histplot(ax=axes[1],data=df2[df2>0], bins=num_bin, log_scale=True, weights=None, legend=False, palette='husl', alpha=0.1)
sns.histplot(ax=axes[2],data=df3[df3>0], bins=num_bin, log_scale=True, weights=None, legend=False, palette='inferno', alpha=0.1)
plt.show() 

I like to see the mean of each bin (calculated on all columns of the dataframes) plotted as a dot and the standard deviation plotted as a vertical line. Do you have any suggestions?

0

There are 0 best solutions below