How can I improve R2 score in my regression model? Predicting House Prices

18 Views Asked by At

I have trained some data on a House Pricing dataset. and I'm getting a not-so-bad R-2 score of nearly 0.5 as you can see below: enter image description here I wanted to ask how can I improve this R-2 Score and get more precise predictions near the actual prices. I have brought the yhat and actual prices below also:

        Price          yhat
0     5250000  7.904547e+06
1     3950000  5.272666e+06
2     9500000  1.541611e+07
3     8000000  1.135316e+07
4    12750000  9.812656e+06
..        ...           ...
99    7000000  9.222798e+06
100   6750000  7.002278e+06
101   6500000  8.844441e+06
102   6500000  7.946185e+06
103   5275000  1.005468e+07

[104 rows x 2 columns]

When predicting prices on them, I had the idea that some districts in my model behave differently from other districts with features. So this came to my mind: maybe I can write a new equation for each district, optimizing coefficients for each feature for every district, and then take an average for all 104 equations I have written. In this way, I can minimize the difference between yhat and the actual price. I was wondering if there's a method for that in machine learning!?

[-1720170.96599959  -450112.9969811   -241807.76731698     7269.75674741
    66318.19738872  1220655.10520284   881134.39040993  2901100.72147558
  1256062.85997242  1831204.62088706  -707473.49663603  2799885.41237361
   797886.35464379] -10796920.246326108
Encoded Values for Karlibayir Mh.: district0=0, district1=5, district2=0
for your desired values: Karlibayir Mh. 90 2 2 5 1 1 1 1 3 3 The predicted price would be:
Predicted Price: 10660621.44001897

as you can see above, in my code I'm getting some parameters from the user to predict a price for that for the user. District, SquareMeter of the desired house, building Age, the floor of the apartment, number of floors in the desired building, number of bathrooms, having an elevator, having Parking, Having a steeped alley or not, quality of material used in the apartment, the degree of luxuriousness of the vicinity

I haven't implemented random forest, XG boosting and other models on the dataset yet. and I wanted to say finally that surprisingly when I do Kfolding, I get fewer R-2 scores!! I don't know why.

I appreciate your responses and time in advance.

this is my correlation heat map by the way: enter image description here

0

There are 0 best solutions below