Plot pandas line chart using dual axis and loop through dataframe

171 Views Asked by At

I have dataframe like this,

data = {'TIMEFRAME':['9/12/2014 17:52', '10/12/2014 5:02', '10/12/2014  8:04'],
        'Volumetric Flow Meter 1':[0.82, 0.88, 0.9],
        'Pump Speed (RPM)':[2.5,2.7,3.01],
        'Data Source':['raw data','raw data','raw data'],
        'PUMP FAILURE (1 or 0)':[0,0,1]}

df = pd.DataFrame(data)
df

TIMEFRAME       Volumetric Flow Meter 1  Pump Speed (RPM)  Data Source   PUMP FAILURE (1 or 0)
9/12/2014  17:52           0.82                   2.5      raw data           0   
10/12/2014 5:02            0.88                   2.7      raw data           0
10/12/2014 8:04            0.90                   3.01     raw data           1

I am trying to loop through the dataset, plotting every numerical variable individually, against the Pump Failure to identify the trends. I have to create a list of every numerical columns in the dataframe and loop through it to plot them against the PUMP FAILURE (1 or 0) column.

For each plot, I have to ensure that I have a dual axis set up so I can see the Pump Failure (0 or 1) on the second Y-axis, and the attribute on the first Y-Axis.

The output is something like this, graph

This was my approach,

ListOfVariables=[df["Pump Speed (RPM)"],df["Volumetric Flow Meter 1"]]

for item in ListOfVariables:
    first_axis = df[item].plot #Looping through every item in the dataframe.
    second_axis = first_axis.twinx() #The Twinx function is used to ensure we share the X-Axis for both plots
    second_axis.plot(df['PUMP FAILURE (1 or 0)'], color='teal')
    plt.title(item)
    plt.show()

This doesn't produce the desire output. Any help is appreciated. Thanks.

1

There are 1 best solutions below

6
On BEST ANSWER

Use:

import pandas as pd
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt


data = {'TIMEFRAME': pd.date_range('9/12/2014 17:52', '10/12/2014  18:04', 100),
        'Volumetric Flow Meter 1':np.random.randn(100),
        'Pump Speed (RPM)':np.random.randn(100),
        'Data Source':['raw data']*100,
        'PUMP FAILURE (1 or 0)':np.random.randn(100)}

df = pd.DataFrame(data)
df['TIMEFRAME'] = pd.to_datetime(df['TIMEFRAME'])
cols = df.columns[:-1]

for col in cols[1:-1]:
    fig, ax = plt.subplots(figsize=(15,3))
    ax.plot(df[cols[0]], df['PUMP FAILURE (1 or 0)'], color = 'red')
    ax2 = ax.twinx()
    ax2.plot(df[cols[0]], df[col], color='teal')
    ax.set_xticklabels(df[cols[0]].dt.floor('S'), rotation=90)
    ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=600))
    plt.title(col)
    plt.show()

With interval = 600, it means each 10 hours. I tested it with 300 and the representation is not so well. If you want smaller time steps first increase the fig size.

Output:

enter image description here