Matplotlib - Scipy/Sklearn Interaction - LinearRegression Error in scipy.linalg._flapack

372 Views Asked by At

I am having some issues with the interaction between matplotlib and scipy. This is my understanding of the situation:

  1. The error consists in the LinearRegression fit of sklearn throwing the SVD not converging error
  2. After some debugging the error is thrown by scipy\linalg\basic.py where the dgesld method returns an info value different from 0 (-4 in this case). The lapack_func used is, in my case, the fortran flatpack dgesld.
  3. The error seems to depend on both the numerosity of the input and the pyplot (matplotlib) code, in particular the methods yticks, xticks.
  4. The error first occurred in a multilinear regression problem (info value in scipy\linalg\basic.py equal to 23, positive in this case) but I have written the following script to better outline the issue
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


a = [0.27236845, 0.79433854, 0.05986454, 0.62736383, 0.5732594
    , 0.54175392, 0.92359127, 0.19913404, 0.17357701, 0.10225879
    , 0.94727807, 0.23766063, 0.92438574, 0.10981865, 0.18669187
    , 0.71337215, 0.17843819, 0.98693265, 0.80787247, 0.931572]

b = [1.68869178, 2.20448291, 1.64828788, 1.95276497, 1.23976119, 1.61260175
    , 1.32652345, 1.94535222, 1.37353248, 1.47830833, 1.08400723, 1.91091901
    , 1.63909271, 2.37494003, 1.64490261, 1.90403079, 1.81028796, 1.66986048
    , 1.65304452, 1.60747378]

for no_plot in [True, False]:
    for i in range(len(a)-1):
        _a = a[:i + 2]
        _b = b[:i + 2]
        if not no_plot:
            bar_color = "blue"
            margin = 10
            y_label = x_label = None
            angle = 0
            title = "TestError"
            color_theme = (0 / 235, 32 / 235, 96 / 235)
            fig, ax = plt.subplots(figsize=(18, 6.8))
            plt.bar(_a, _b, color=bar_color)
            box = ax.get_position()
            ax.set_position([box.x0, box.y0 + margin * box.height, box.width, box.height * (1 - margin)])
            plt.xticks(fontname="Cambria", color=color_theme, rotation=angle, fontsize=25)
            plt.yticks(fontname="Cambria", color=color_theme, fontsize=25)
            plt.title(title, fontname="Cambria", color=color_theme, fontsize=25)
            ax_output = plt.gca()
        try:
            reg = LinearRegression().fit(np.array(_a).reshape(-1, 1), _b)
            print("Success: {}, @ i={} with no_plot={}".format(reg.score(np.array(_a).reshape(-1, 1), _b), i, no_plot))
        except Exception as e:
            print("Exception: {} @ i={} with no_plot={}".format(repr(e), i, no_plot))

When run on a Windows 10 machine with:

python version: 3.7.9
scipy version: 1.5.2
scikit-learn version: 0.23.2
numpy version: 1.19.2
matplotlib version: 3.3.2

and _flapack.cp37-win_amd64

The results are the following:

Success: 1.0, @ i=0 with no_plot=True
Success: 0.9524690407545247, @ i=1 with no_plot=True
Success: 0.9248909415334777, @ i=2 with no_plot=True
Success: 0.17921330631542143, @ i=3 with no_plot=True
Success: 0.1559357435898613, @ i=4 with no_plot=True
Success: 0.001129573837944875, @ i=5 with no_plot=True
Success: 0.008667658302087822, @ i=6 with no_plot=True
Success: 0.001674117195053615, @ i=7 with no_plot=True
Success: 0.011802146118754298, @ i=8 with no_plot=True
Success: 0.024141340568111902, @ i=9 with no_plot=True
Success: 0.04144995409093344, @ i=10 with no_plot=True
Success: 0.03301917468171267, @ i=11 with no_plot=True
Success: 0.0959782634092683, @ i=12 with no_plot=True
Success: 0.08847483030078473, @ i=13 with no_plot=True
Success: 0.06428117850391502, @ i=14 with no_plot=True
Success: 0.07033033186821203, @ i=15 with no_plot=True
Success: 0.06394158828230323, @ i=16 with no_plot=True
Success: 0.0640239869160919, @ i=17 with no_plot=True
Success: 0.06734590831873866, @ i=18 with no_plot=True
Success: 1.0, @ i=0 with no_plot=False
Success: 0.9524690407545247, @ i=1 with no_plot=False
Success: 0.9248909415334777, @ i=2 with no_plot=False
Success: 0.17921330631542143, @ i=3 with no_plot=False
Success: 0.1559357435898613, @ i=4 with no_plot=False
Success: 0.001129573837944875, @ i=5 with no_plot=False
Success: 0.008667658302087822, @ i=6 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=7 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=8 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=9 with no_plot=False<
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=10 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=11 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=12 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=13 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=14 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=15 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=16 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=17 with no_plot=False
Exception: ValueError('illegal value in 4-th argument of internal None') @ i=18 with no_plot=False

As far as the stacktrace is concerned:

*Traceback (most recent call last):
File: "......./isla.py", line 36, in <module>
reg = LinearRegression().fit(np.array(_a).reshape(-1, 1), _b)
File "......\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
linalg.lstsq(X, y)
File "......\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
% (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

Frankly, I am a little bit lost. Does anybody have any idea on the issue?

0

There are 0 best solutions below