I am using 4 datasets, which are very identical (example data head given below). I am using plotly graph_objects Box function together with make_subplots function to create the box plots. Below is the code and the plots that are getting generated, which seem to be wrong.
Rank Company Revenue employees Industry age
0 1 Walmart 523964 2,300,000 Tech 44
1 2 Sinopec Group 407009 71,200 Tech 56
# Set up the subplots grid
fig = make_subplots(rows=2, cols=2,
# Set the subplot titles
subplot_titles=['Tech', 'Professional Services', 'Retail', 'Oil'])
# Add the Tech trace
fig.add_trace(go.Box(x=df_tech.Revenue, name='', showlegend=False), row=1, col=1)
# Add the Professional Services trace
fig.add_trace(go.Box(x=df_prof_serve.Revenue, name='', showlegend=False), row=1, col=2)
# Add the Retail trace
fig.add_trace(go.Box(x=df_retail.Revenue, name='', showlegend=False), row=2, col=1)
# Add the Oil trace
fig.add_trace(go.Box(x=df_oil.Revenue, name='', showlegend=False), row=2, col=2)
# Add a title (and show)
fig.update_layout({'title': {'text': 'Box plots of company revenues', 'x': .5, 'y': .9}})
fig.show()
Ideally the plots should be as follows. Instead of subplots, I tried plotting just one plot directly using one dataset and go.Figure. But the result is same, incorrect plot. It looks like the outliers are getting sucked into the whiskers part in all my plots. The below intended plots show outliers clearly and also the distribution correctly. Kindly suggest what should be done here.
Oh... I got it resolved. It was a silly error on my part. It occurred to me to check the data types. And I found that as I have correctly expected the 'Revenue' feature was '
Object
' type. I converted it to 'integer
' and everything worked perfectly fine. Thanks all and hope this helps.