Adding percentages to Venn-diagram using matplotlib_venn

390 Views Asked by At

I have a data frame that looks like this:

           customerid  brand
0        A2222242BG84  A
1        A2222255LD3L  B
2        A2222255LD3L  A
3        A2222263537U  A
4        A2222265CE34  C
...               ...  ...
6679602  A9ZZ86K4VM97  B
6679603  A9ZZ9629MP6E  B
6679604  A9ZZ9629MP6E  C
6679605  A9ZZ9AB9RN5E  A
6679606  A9ZZ9C47PZ8G  C

where the brands are A,B and C. Many customers are customers in one brand, two brands or all three brands and I want to draw a Venn diagram indicating how customers are shared over all brands. I've managed to correctly write the code to show the different counts, in thousands of units but I struggle to make the Venn diagram show how many percent of the entire customer base that count entails.

Here is my complete code and should be completely reproducible:

import matplotlib.pyplot as plt
import matplotlib_venn as venn

def count_formatter(count, branch_counts):
    # Convert count to thousands
    count = count / 1000
    # Return the count as a string, followed by the percentage
    return f'{count:.1f}K ({100 * count / sum(branch_counts.values):.1f}%)'

# Get counts of each branch
branch_counts = df['brand'].value_counts()

# Convert counts to sets
branch_sets = [set(group_data['customerid']) for _, group_data in df.groupby('brand')]

plt.figure(figsize=(10, 10))

# Generate the Venn diagram
venn.venn3(
    subsets=branch_sets, 
    set_labels=['A', 'B', 'C'], 
    subset_label_formatter=lambda count, branch_counts=branch_counts: count_formatter(count, branch_counts)
)

# Show the plot
plt.show()

The figure that's generated only shows 0.0% on all the instances. I don't see why this is.

enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

It should work if you modify the count_formatter function slightly. Just multiply the value of count with 1000 again before calulating the percentage value...

def count_formatter(count, branch_counts):
    # Convert count to thousands
    count = count / 1000
    # Return the count as a string, followed by the percentage
    return f'{count:.1f}K ({100 * count*1000 / sum(branch_counts.values):.1f}%)'

... or alternatively convert the count value on the fly (without storing the new value):

def count_formatter(count, branch_counts):
    # Return the count as a string, followed by the percentage
    return f'{count/1000:.1f}K ({100 * count / sum(branch_counts.values):.1f}%)'