I have written the following:

ax = df.pivot_table(index=['month'], columns='year', values='sale_amount_usd', margins=True,fill_value=0).round(2).plot(kind='bar',colormap=('Blues'),figsize=(18,15))
plt.legend(loc='best')
plt.ylabel('Average Sales Amount in USD')
plt.xlabel('Month')
plt.xticks(rotation=0)
plt.title('Average Sales Amount in USD by Month/Year')
for p in ax.patches:
    ax.annotate(str(p.get_height()), (p.get_x() * 1.001, p.get_height() * 1.005))
plt.show();

Which returns a nice bar chart: enter image description here

I'd now like to be able to tell whether the differences in means within each month, between years, is significant. In other words, is the jump from $321 in March 2013 to $365 in March 2014 a significant increase in average sales amount?

How would I do this? Is there a way to overlay a marker on the pivot table that tells me, visually, when a difference is significant?

edited to add sample data:

    event_id    event_date  week_number week_of_month   holiday month   day year    pub_organization_id clicks  sales   click_to_sale_conversion_rate   sale_amount_usd per_sale_amount_usd per_click_sale_amount   pub_commission_usd  per_sale_pub_commission_usd per_click_pub_commission_usd
0   3365    1/11/13 2   2   NaN 1. January  11  2013    214 11945   754 0.06    40311.75    53.46   3.37    2418.71 3.21    0.20
1   13793   2/12/13 7   3   NaN 2. February 12  2013    214 11711   1183    0.10    73768.54    62.36   6.30    4426.12 3.74    0.38
2   4626    1/15/13 3   3   NaN 1. January  15  2013    214 11561   1029    0.09    70356.46    68.37   6.09    4221.39 4.10    0.37
3   10917   2/3/13  6   1   NaN 2. February 3   2013    167 11481   0   0.00    0.00    0.00    0.00    0.00    0.00    0.00
4   14653   2/15/13 7   3   NaN 2. February 15  2013    214 11268   795 0.07    37262.56    46.87   3.31    2235.77 2.81    0.20
5   18448   2/27/13 9   5   NaN 2. February 27  2013    214 11205   504 0.04    48773.71    96.77   4.35    2926.43 5.81    0.26
6   11382   2/5/13  6   2   NaN 2. February 5   2013    214 11166   1324    0.12    93322.84    70.49   8.36    5599.38 4.23    0.50
7   14764   2/16/13 7   3   NaN 2. February 16  2013    214 11042   451 0.04    22235.51    49.30   2.01    1334.14 2.96    0.12
8   17080   2/23/13 8   4   NaN 2. February 23  2013    214 10991   248 0.02    14558.85    58.71   1.32    873.53  3.52    0.08
9   21171   3/8/13  10  2   NaN 3. March    8   2013    214 10910   1081    0.10    52005.12    48.11   4.77    3631.28 3.36    0.33
10  16417   2/21/13 8   4   NaN 2. February 21  2013    214 10826   507 0.05    44907.20    88.57   4.15    2694.43 5.31    0.25
11  13399   2/11/13 7   3   NaN 2. February 11  2013    214 10772   1142    0.11    38549.55    33.76   3.58    2312.97 2.03    0.21
12  1532    1/5/13  1   1   NaN 1. January  5   2013    214 10750   610 0.06    29838.49    48.92   2.78    1790.31 2.93    0.17
13  22500   3/13/13 11  3   NaN 3. March    13  2013    214 10743   821 0.08    47310.71    57.63   4.40    3688.83 4.49    0.34
14  5840    1/19/13 3   3   NaN 1. January  19  2013    214 10693   487 0.05    28427.35    58.37   2.66    1705.64 3.50    0.16
15  19566   3/3/13  10  1   NaN 3. March    3   2013    214 10672   412 0.04    15722.29    38.16   1.47    1163.16 2.82    0.11
16  26313   3/25/13 13  5   NaN 3. March    25  2013    214 10629   529 0.05    21946.51    41.49   2.06    1589.84 3.01    0.15
17  19732   3/4/13  10  2   NaN 3. March    4   2013    214 10619   1034    0.10    37257.20    36.03   3.51    2713.71 2.62    0.26
18  18569   2/28/13 9   5   NaN 2. February 28  2013    214 10603   414 0.04    40920.28    98.84   3.86    2455.22 5.93    0.23
19  8704    1/28/13 5   5   NaN 1. January  28  2013    214 10548   738 0.07    29041.87    39.35   2.75    1742.52 2.36    0.17
1

There are 1 best solutions below

0
On

Although not conclusive, you could use error bars (through the yerr argument in plt.plot) that represent one standard deviation of uncertainty, and then just eyeball the overlap of the intervals. Something like (not tested)...

stds = df.groupby(['month', 'year'])['sale_amount_usd'].std().to_frame()

stds.columns = ['std_sales']

df_stds = df.pivot_table(index=['month'], columns='year',\
                values='sale_amount_usd', \
                margins=True,fill_value=0).round(2).join(stds)

ax = df_stds.plot(kind='bar', yerr = 'std_sales', colormap=('Blues'),figsize=(18,15))