How to line plot multiple columns with nan and on twinx / secondary_y

277 Views Asked by At

The graph is fixed now but I am having troubles plotting the legend. It only shows legend for 1 of the plots. As seen in the picture below

I am trying to plot a double axis graph with twinx but I am facing some difficulties as seen in the picture below.

Any input is welcomed! If you require any additional information, I am happy to provide them to you.

enter image description here

as compared to the original before plotting z-axis.

enter image description here

I am unsure why my graph is like that as initially before plotting my secondary y axis, (the pink line), the closing value graph can be seen perfectly but now it seems cut.

It may be due to my data as provided below.

Code I have currently:

# read csv into variable
sg_df_merged = pd.read_csv("testing1.csv", parse_dates=[0], index_col=0)

# define figure
fig = plt.figure()

fig, ax5 = plt.subplots()
ax6 = ax5.twinx()

x = sg_df_merged.index
y = sg_df_merged["Adj Close"]
z = sg_df_merged["Singapore"]

curve1 = ax5.plot(x, y, label="Singapore", color = "c")
curve2 = ax6.plot(x, z, label = "Face Mask Compliance", color = "m")
curves = [curve1, curve2]

# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
ax5.grid #not sure what this line does actually

# set x-axis values to 45 degree angle
for label in ax5.xaxis.get_ticklabels():
    label.set_rotation(45)
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)

plt.gca().legend(loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show(); 

Initially, I thought it was due to my excel having entire blank lines, but I have since removed those rows. The sample data is in this question.

Also, I have tried to interpolate but somehow it doesn't work.

1

There are 1 best solutions below

0
On BEST ANSWER
  • Only rows that where all NaN, were dropped. There’s still a lot of rows with NaN.
  • In order for matplotlib to draw connecting lines between two data points, the points must be consecutive.
  • The plot API isn't connecting the data between the NaN values
  • This can be dealt with by converting the pandas.Series to a DataFrame, and using .dropna.
  • See that x has been dropped, because it will not match the index length of y or z. They are shorter after .dropna.
  • y is now a separate dataframe, where .dropna is used.
  • z is also a separate dataframe, where .dropna is used.
  • The x-axis for the plot are the respective indices.
  • Tested in python v3.12.0, pandas v2.1.2, matplotlib v3.8.1.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# read csv into variable
sg_df_merged = pd.read_csv("test.csv", parse_dates=[0], index_col=0)

# define figure
fig, ax5 = plt.subplots(figsize=(8, 6))
ax6 = ax5.twinx()

# select specific columns to plot and drop additional NaN
y = pd.DataFrame(sg_df_merged["Adj Close"]).dropna()
z = pd.DataFrame(sg_df_merged["Singapore"]).dropna()

# add plots with markers
curve1 = ax5.plot(y.index, 'Adj Close', data=y, label="Singapore", color = "c", marker='o')
curve2 = ax6.plot(z.index, 'Singapore', data=z, label = "Face Mask Compliance", color = "m", marker='o')

# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
    
# rotate xticks
ax5.xaxis.set_tick_params(rotation=45)

# add a grid to ax5
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)

# create a legend for both axes
curves = curve1 + curve2
labels = [l.get_label() for l in curves]
ax5.legend(curves, labels, loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")

plt.show()

enter image description here


# given a datetime[ns] dtype index, if the time components are all 0, extracting only the date will cause the xticklabels to be centered under the tick
df.index = df.index.date

ax = df['Adj Close'].dropna().plot(marker='.', color='c', grid=True, figsize=(12, 6),
                                   title='My Plot', ylabel='Adj Close', xlabel='Date', legend='Adj Close')
ax_right = df['Singapore'].dropna().plot(marker='.', color='m', secondary_y=True, legend='Singapore', rot=0, ax=ax)

ax_right.set_ylabel('Singapore')

ax.legend(title='Country Index', bbox_to_anchor=(1.06, 0.5), loc='center left', frameon=False)
ax_right.legend(bbox_to_anchor=(1.06, 0.43), loc='center left', frameon=False)

ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=(1, 7)))
ax.xaxis.set_minor_locator(mdates.MonthLocator())

enter image description here


Data

  • Copy the data to the clipboard and read with the following line.
df = pd.read_clipboard(sep=',', index_col=[0], parse_dates=[0]) 
,Adj Close,Singapore
2015-10-01,2998.350098,
2015-11-01,2855.939941,
2015-12-01,2882.72998,
2016-01-01,2629.110107,
2016-02-01,2666.51001,
2016-03-01,2840.899902,
2016-04-01,2838.52002,
2016-05-01,2791.060059,
2016-06-01,2840.929932,
2016-07-01,2868.689941,
2016-08-01,2820.590088,
2016-09-01,2869.469971,
2016-10-01,2813.8701170000004,
2016-11-01,2905.169922,
2016-12-01,2880.76001,
2017-01-01,3046.800049,
2017-02-01,3096.610107,
2017-03-01,3175.110107,
2017-04-01,3175.439941,
2017-05-01,3210.820068,
2017-06-01,3226.47998,
2017-07-01,3329.52002,
2017-08-01,3277.26001,
2017-09-01,3219.909912,
2017-10-01,3374.080078,
2017-11-01,3433.540039,
2017-12-01,3402.919922,
2018-01-01,3533.98999,
2018-02-01,3517.939941,
2018-03-01,3427.969971,
2018-04-01,3613.929932,
2018-05-01,3428.179932,
2018-06-01,3268.699951,
2018-07-01,3319.850098,
2018-08-01,3213.47998,
2018-09-01,3257.050049,
2018-10-01,3018.800049,
2018-11-01,3117.610107,
2018-12-01,3068.76001,
2019-01-01,3190.169922,
2019-02-01,3212.689941,
2019-03-01,3212.879883,
2019-04-01,3400.199951,
2019-05-01,3117.76001,
2019-06-01,3321.610107,
2019-07-01,3300.75,
2019-08-01,3106.52002,
2019-09-01,3119.98999,
2019-10-01,3229.879883,
2019-11-01,3193.919922,
2019-12-01,3222.830078,
2020-01-01,3153.72998,
2020-02-01,3011.080078,
2020-02-21,,24.0
2020-02-25,,
2020-02-28,,22.0
2020-03-01,2481.22998,
2020-03-02,,
2020-03-03,,
2020-03-06,,23.0
2020-03-10,,
2020-03-13,,21.0
2020-03-17,,
2020-03-20,,24.0
2020-03-23,,
2020-03-24,,
2020-03-27,,27.0
2020-03-30,,
2020-03-31,,
2020-04-01,2624.22998,
2020-04-03,,37.0
2020-04-06,,
2020-04-07,,
2020-04-10,,73.0
2020-04-13,,
2020-04-14,,
2020-04-17,,85.0
2020-04-20,,
2020-04-21,,
2020-04-24,,90.0
2020-04-27,,
2020-04-28,,
2020-05-01,2510.75,90.0
2020-05-05,,
2020-05-15,,
2020-05-21,,
2020-05-22,,92.0
2020-05-25,,
2020-05-26,,
2020-05-30,,
2020-06-01,2589.909912,
2020-06-05,,89.0
2020-06-08,,
2020-06-15,,
2020-06-16,,
2020-06-19,,92.0
2020-06-22,,
2020-06-25,,
2020-07-01,2529.820068,
2020-07-03,,
2020-07-06,,
2020-07-07,,90.0
2020-07-12,,
2020-07-14,,
2020-07-20,,92.0
2020-07-26,,
2020-07-27,,
2020-07-31,,
2020-08-01,2532.51001,
2020-08-03,,88.0
2020-08-07,,
2020-08-10,,
2020-08-12,,
2020-08-14,,90.0
2020-08-17,,
2020-08-25,,
2020-08-28,,90.0
2020-08-31,,
2020-09-01,2490.090088,
2020-09-11,2490.090088,