How to extract y-values from my regression line in plotnine (ggplot)

335 Views Asked by At

#So my code looks like this but I have no idea how to extract y-values from the regression line or #maybe even show the equation of the curve.

import plotnine as p9 from scipy import stats

#calculate best fit line

slope, intercept, r_value, p_value, std_err = stats.linregress(df['gross_power'],df['temp_drop_a'])
df['fit']=df.gross_power*slope+intercept

#format text

txt= 'y = {:4.2e} x + {:4.2E};   R^2= {:2.2f}'.format(slope, intercept, r_value*r_value)

#create plot. The 'factor' is a nice trick to force a discrete color scale

plot=(p9.ggplot(data=df, mapping= p9.aes('gross_power','temp_drop_a'))
    + p9.geom_point(p9.aes())
    + p9.xlab('Gross Generation (MW)')+ p9.ylab(r'Air Heater Temperature Generator A (F)')
    + p9.geom_line(p9.aes(x='gross_power', y='fit'), color='red')
    + p9.annotate('text', x= 3, y = 35, label = txt))


print(plot)
1

There are 1 best solutions below

0
On

Your code worked for me. What is the issue you are encountering?

Your predicted y-values is fit in your dataframe. You can extract the y-values as:

y_hat = df['fit']

To display the equation on the plot, use annotate as you have. Complete code:

import pandas as pd
from scipy import stats
from plotnine import *
from plotnine.data import mtcars as df

slope, intercept, r_value, p_value, std_err = stats.linregress(df['hp'],df['mpg'])
df['fit'] = df.hp*slope+intercept
y_hat = df['fit']

txt= 'y = {:4.2e} x + {:4.2E};   R^2= {:2.2f}'.format(slope, intercept, r_value*r_value)

p = (ggplot(data=df)
  + theme_light(9)
  + geom_point(aes('hp','mpg'), size=1, colour='#E78587')
  + geom_line(aes(x='hp', y='fit'), color='#8B0000')
  + labs(x='Gross horsepower', y='Miles per gallon')
  + annotate('text', x=250, y=5, label=txt, size=8, color='#8B0000')
)
p

Perhaps there is an issue with temp_drop_a in your code. I don't know what this is and it doesn't appear on the source dataset. Also doesn't sound like a typical explanatory variable.