I have dataframe like this,
import pandas as pd
from matplotlib import pyplot as plt
data = {
"TIMEFRAME": ["9/12/2014 17:52", "10/12/2014 5:02", "10/12/2014 8:04"],
"Volumetric Flow Meter 1": [0.82, 0.88, 0.9],
"Pump Speed (RPM)": [2.5, 2.7, 3.01],
"Data Source": ["raw data", "raw data", "raw data"],
"PUMP FAILURE (1 or 0)": [0, 0, 1],
}
df = pd.DataFrame(data)
df
TIMEFRAME Volumetric Flow Meter 1 Pump Speed (RPM) Data Source PUMP FAILURE (1 or 0)
9/12/2014 17:52 0.82 2.5 raw data 0
10/12/2014 5:02 0.88 2.7 raw data 0
10/12/2014 8:04 0.90 3.01 raw data 1
I am trying to loop through the dataset, plotting every numerical variable individually, against the Pump Failure to identify the trends. I have to create a list of every numerical columns in the dataframe and loop through it to plot them against the PUMP FAILURE (1 or 0) column.
For each plot, I have to ensure that I have a dual axis set up so I can see the Pump Failure (0 or 1) on the second Y-axis, and the attribute on the first Y-Axis.
The output is something like this,
Given code,
ListOfVariables=[df["Pump Speed (RPM)"],df["Volumetric Flow Meter 1"]]
for item in ListOfVariables:
first_axis = df[''].plot #Looping through every item in the dataframe
second_axis = first_axis.twinx() #The Twinx function is used to ensure we share the X-Axis for both plots
second_axis.plot(df['PUMP FAILURE (1 or 0)'], color='teal')
plt.title(item)
plt.show()
I am confused about this part, first_axis = df[''].plot
Not sure what to use.
Replace:
ListOfVariables=[df["Pump Speed (RPM)"],df["Volumetric Flow Meter 1"]]
withListOfVariables=["Pump Speed (RPM)","Volumetric Flow Meter 1"]
first_axis = df[''].plot
withfirst_axis = df[item].plot()
And your code works: