plot stacked bar chart using bokeh

538 Views Asked by At

I am trying to plot a stacked bar chart using bokeh by following this segment of the documentation. but my data frame is a tad more complex. it looks like this:

   events    count     Name
    a          2       jerry
    b          1       jerry
    a          8       joe
    c          1       joe 
    b          4       megan
    c          1       megan 
   ...        ...       ...

data.user.nunique() = 11 (will be in columns) and data.event.nunique() = 167 (will be the stacked segments for each column note that not every user has raised all unique events)

so according to code from the docs and for the above segment of dataframe:

output_file("stacked.html")
names = data.Name.unique()          # ['jerry','joe','megan']
events = data.events.unique()       # ['a','b','c']
colors =["#c9d9d3", "#718dbf", "#e84d60"]        

data = {'names' : names,
        'a'   : [2, 8, 0],   # a raised 2 times by jerry, 8 times by joe , 0 times by megan
        'b'   : [1, 0, 4],
        'c'   : [0, 1, 1]}  

my question is twofold, 1) how do I create the data dictionary from my actual dataset? 2) is there any alternative approach to solving this problem?

1

There are 1 best solutions below

1
On BEST ANSWER

bokeh doesn't necessarily need a dictionary to work, so we can actually just use the pivot Dataframe method to achieve the desired transformation and plot the result directly.

>>> df = pd.DataFrame({
    'events': ['a', 'b', 'a', 'c', 'b', 'c'],
    'count': [2, 1, 8, 1, 4, 1],
    'Name': ['jerry', 'jerry', 'joe', 'joe', 'megan', 'megan']})

>>> df
  events  count   Name
0  a      2      jerry
1  b      1      jerry
2  a      8      joe  
3  c      1      joe  
4  b      4      megan
5  c      1      megan

Transform the data:

>>> df2 = df.pivot(index="Name", columns="events", values="count").fillna(0)
>>> df2
events  a   b   c
Name            
jerry   2.0 1.0 0.0
joe     8.0 0.0 1.0
megan   0.0 4.0 1.0

Plot the data:

from bokeh.plotting import figure
from bokeh.palettes import viridis

names = df2.index.tolist()
events = df2.columns.tolist()
color = viridis(len(events))

p = figure(x_range=names)
p.vbar_stack(events, x="Name", source=df2, width=.9, color=color), legend_label=events)
show(p)

enter image description here

An alternative way of plotting this is to use the holoviews library (simply adding this because holoviews can produce some waaay more concise code than bokeh). Holoviews takes care of the data transformations for you so you don't need any added effort:

import holoviews as hv
hv.extension("bokeh")

hv.Bars(df, kdims=["Name", "events"], vdims="count").opts(stacked=True)

enter image description here

As for alternative solutions, I'm not entirely sure. I can't see visual comparisons being very easy with 167 types of events (that's 167 unique colors, so the colors may not be extremely discernable- not to mention an unwieldly legend with 167 entries). If this way of visualizing doesn't help, I would recommend using the Holoviews library to create a barplot for each of your names. Then you can toggle through a plot for each individual you have in the data.

import holoviews as hv
hv.extension("bokeh")

hv.Bars(df, kdims=["Name", "events"], vdims="count").groupby("Name")

1