i am running following code, graph for training dataset is giving error,
import pandas as pd
import numpy as np
df = pd.read_csv('11.csv')
df.head()
AT V AP RH PE
0 8.34 40.77 1010.84 90.01 480.48
1 23.64 58.49 1011.40 74.20 445.75
2 29.74 56.90 1007.15 41.91 438.76
3 19.07 49.69 1007.22 76.79 453.09
4 11.80 40.66 1017.13 97.20 464.43
x = df.drop(['PE'], axis = 1).values
y = df['PE'].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state=0)
from sklearn.linear_model import LinearRegression
ml = LinearRegression()
ml.fit(x_train, y_train)
y_pred = ml.predict(x_test)
print(y_pred)
import matplotlib.pyplot as plt
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, ml.predict(x_test), color = 'green')
plt.show() ***
please help to reshape 2d to 1d array for plotting graphs
**ValueError: x and y must be the same size**
EDIT: Now that your question has it's format fixed, I'm spotting a few errors, with a theme of using 1D linear regression code to plot your multiple regression problem.
plt.scatter(x_train, y_train, color = 'red')
: You're trying to plot multiple variables in one axis (AT, V, AP, RH) usingx_train
. You cannot do this since this is multiple linear regression. (For example, one can't fit pressure and volume on the x-axis against temperature on the y. What does the x-axis represent? It doesn't make sense.) You cannot plot what you are trying to plot, and I cannot give you suggestions since I don't know what you're trying to plot. You can try one variable at a time, e.g.plt.scatter(x_train['AT'], y_train, color='red')
. Or you use different color to plot each variable on the same graph - though I don't recommend this since your x-axis could be of different units.plt.plot(x_train, ml.predict(x_test)
: You should be using y_test for your x-input. E.g.plt.plot(y_test, ml.predict(x_test))
. This is a problem with the length of your data, not your width/columns like the error above. Though if my suggestion isn't what you wanted (it's a little strange to plot y_test and your y predictions), you might be inputting (incorrectly) assumptions/code for 1D linear regression when you're working with multiple linear regression - a potential theme in these errors.