Plotting two lines of best fit, and finding intersection point, on a 'dogleg' plot uisng Seaborn

83 Views Asked by At

I am trying to fit two lines of best fit on this plot, and to find the intersection point of these lines. This will be used in conjunction with a seaborn.facetgrid to show a trend across years.

Dogleg plot to fit two LOBF on - one following steep decline, one showing the start of a plateau

I have used sns.lmplot, and sns.regplot, each delivering 1 line of best fit. I have used regplot alongside the inbuilt 'hue' function, with manually chosen data points to generate two lines of best fit - but I was wondering if this can be automated and therefore, not reliant on human selection of datapoints.

This was created using sns.regplot, with hue deciding the colouring of datapoints at the LOBF plotted based on colour:

This was created using sns.regplot, with hue deciding the colouring of datapoints at the LOBF plotted based on colour

1

There are 1 best solutions below

0
Muhammed Yunus On

Since you want to find the knee point and lines automatically, you can fit a curve and then use that information to separate out the steep part from the shallow part. Each of those defines a line, for which you can find the intersection.

The example below shows how to do this for some test data.

enter image description here

Imports and test data:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

#
#Make some test data
#
np.random.seed(0)
x0 = np.random.uniform(size=150)
y0 = -1.5 * x0 + 3 + np.random.randn(150) / 7
x1 = 0.8 + np.random.uniform(size=80) * 1.7 / 3
y1 = -0.5 * x1 + 2 + np.random.randn(80) / 13
x2 = 1.5 + np.random.randn(60) / 3
y2 = -0.2 * x2 + 1.6 + np.random.randn(60) / 15

data = pd.DataFrame({
    'x': np.concatenate([x0, x1, x2]).tolist(),
    'y': np.concatenate([y0, y1, y2]).tolist(),
    'category': list('A' * len(x0) + 'B' * len(x1) + 'C' * len(x2))
})

# Plot the data
ax = sns.scatterplot(data, x='x', y='y', hue='category')

Find two lines and their intersection, and plot.

#Fit an initial quadratic curve to all of the data
poly_fit = np.polynomial.Polynomial.fit(data['x'], data['y'], deg=2)

#Get the curve's values
x_fine = np.linspace(data['x'].min(), data['x'].max(), num=100)
curve = poly_fit(x_fine)
gradient = poly_fit.deriv()(x_fine)

#Since we expect to be fitting a knee in the curve,
# we can incorporate that prior knowledge by
# ignoring the 'upswing' part the curve and re-fitting
# to where the gradients are < shallowest point

#Filter data to points before the gradient minimum
min_pt = x_fine[np.abs(gradient) == np.abs(gradient).min()].item()
x_fine = x_fine[x_fine < min_pt]
data_x = data.loc[data['x'] < min_pt, 'x'].values
data_y = data.loc[data['x'] < min_pt, 'y'].values

#Re-fit curve
poly_fit = np.polynomial.Polynomial.fit(data_x, data_y, deg=2)
curve = poly_fit(x_fine)
gradient = poly_fit.deriv()(x_fine)

#Plot the curve (coloured by gradient)
ax.scatter(x_fine, curve, c=gradient, cmap='Greys_r', zorder=2, s=100, label='curve')
ax.legend()
ax_lims = ax.axis()

#Find 2 lines (a, and b)
# The first line is the average of lower quartile of gradients,
# and the second line is the average of the upper quartile of gradients
slope_a = gradient[gradient < np.quantile(gradient, 0.25)].mean()
slope_b = gradient[gradient > np.quantile(gradient, 0.75)].mean()

#Get an (x, y) point that each line goes through
x0_a, x0_b = [x_fine[np.abs(gradient - slope).argmin()] for slope in [slope_a, slope_b]]
y0_a, y0_b = [poly_fit(x0) for x0 in [x0_a, x0_b]]

#Intercept of each line
# y0 = slope * x0 + intercept -> intercept = y0 - slope * x0
intercept_a = y0_a - slope_a * x0_a
intercept_b = y0_b - slope_b * x0_b

#Final line equations
line_a = slope_a * x_fine + intercept_a
line_b = slope_b * x_fine + intercept_b

#Plot the lines
for label, line, slope, intercept in [('a', line_a, slope_a, intercept_a),
                                      ('b', line_b, slope_b, intercept_b)]:
    ax.plot(
        x_fine, line, color='red', alpha=0.25, linewidth=8,
        label=f'$y_{label}={slope:.2f}x+{intercept:.2f}$'
    )
ax.axis(ax_lims)

#Find their intersection point, and plot
intersect_x = (intercept_b - intercept_a) / (slope_a - slope_b)
intersect_y = slope_a * intersect_x + intercept_a
ax.plot(
    intersect_x, intersect_y, marker='X', color='tab:red', linestyle='none',
    markersize=12, label=f'intersection ({intersect_x:.2f}, {intersect_y:.2f})',
)
ax.legend(fontsize=9)
ax.figure.set_size_inches(8, 4)

#Optionally remove the spines
[ax.spines[spine].set_visible(False) for spine in ['right', 'top']]