muliple linear regression, traing dataset graphs ,ValueError: x and y must be the same size

72 Views Asked by At

i am running following code, graph for training dataset is giving error,

    import pandas as pd
    import numpy as np
    df = pd.read_csv('11.csv')
    df.head()
         AT     V        AP          RH       PE
    0   8.34    40.77   1010.84     90.01   480.48
    1   23.64   58.49   1011.40     74.20   445.75
    2   29.74   56.90   1007.15     41.91   438.76
    3   19.07   49.69   1007.22     76.79   453.09
    4   11.80   40.66   1017.13     97.20   464.43
    x = df.drop(['PE'], axis = 1).values
    y = df['PE'].values
    from sklearn.model_selection import train_test_split
    x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state=0)
    from sklearn.linear_model import LinearRegression
    ml = LinearRegression()
    ml.fit(x_train, y_train)
    y_pred = ml.predict(x_test)
    print(y_pred) 
    import matplotlib.pyplot as plt
    plt.scatter(x_train, y_train, color = 'red')
    plt.plot(x_train, ml.predict(x_test), color = 'green')
    plt.show() ***

please help to reshape 2d to 1d array for plotting graphs
**ValueError: x and y must be the same size**
1

There are 1 best solutions below

4
On

EDIT: Now that your question has it's format fixed, I'm spotting a few errors, with a theme of using 1D linear regression code to plot your multiple regression problem.

plt.scatter(x_train, y_train, color = 'red'): You're trying to plot multiple variables in one axis (AT, V, AP, RH) using x_train. You cannot do this since this is multiple linear regression. (For example, one can't fit pressure and volume on the x-axis against temperature on the y. What does the x-axis represent? It doesn't make sense.) You cannot plot what you are trying to plot, and I cannot give you suggestions since I don't know what you're trying to plot. You can try one variable at a time, e.g. plt.scatter(x_train['AT'], y_train, color='red'). Or you use different color to plot each variable on the same graph - though I don't recommend this since your x-axis could be of different units.

plt.plot(x_train, ml.predict(x_test): You should be using y_test for your x-input. E.g. plt.plot(y_test, ml.predict(x_test)). This is a problem with the length of your data, not your width/columns like the error above. Though if my suggestion isn't what you wanted (it's a little strange to plot y_test and your y predictions), you might be inputting (incorrectly) assumptions/code for 1D linear regression when you're working with multiple linear regression - a potential theme in these errors.