I tried predicting on 5rows of the dataset, but why does it keep predicting on the whole dataset?

103 Views Asked by At

So I build a lm model in R on 65OOO rows (mydata) and I want to see only the predictions for the first 5 rows in order to see how good my model predicts. Below you can see the code I wrote to execute this but it keeps predicting the values of all 65000 rows. Is someone able to help me?

lm_model2002 <- lm(`AC: Volume` ~ `Market Area (L1)`,data=mydata)
summary(lm_model2002) 
df = head(data.frame(`Market Area (L1)`=mydata$`Market Area (L1)`),5)
predict(lm_model2002,newdata=df)

but now the real problem: I took the first row of mydata and copied this row 5 times, then I made a vector that ranges from 1 to 2 and replaced one of the variables ( price per unit) with that vector. As a result, I want to predict the exact same rows but with only a different price, so that i am able to plot this evolution of a higher price:

lm_model3204<- lm(`AC: Volume` ~ log(price_per_unit)*(Cluster_country_hierarchical+`Loyalty-cumulative-volume-10`+`Loyalty-cumulative-orders-10`+`Loyalty-number-of-order-10`+price_discount+Incoterms)+Cluster_spg*(price_discount+Cluster_country_hierarchical)+price_discount*(Month+`GDP per capita`+`Loyalty-cumulative-orders-10`+`Loyalty-cumulative-volume-10`)+`Payer CustGrp`+`CRU Index`,data = mydata)
summary(lm_model3204)
test_data <- mydata[1:1,] 
df <- data.frame(test_data,ntimes=c(5)) 
df <- as.data.frame(lapply(df, rep, df$ntimes)) 
priceperunit<-seq(1,2,by=0.25) 
df$price_per_unit<-priceperunit 
pred <- predict(lm_model3204,newdata=df) 
1

There are 1 best solutions below

6
On

Please use a minimal reproducible example next time you post a question.

You just have to predict the first five rows. Here an example with the in-built iris dataset

data("iris")

lm_model2002 <- lm(Sepal.Length ~ Sepal.Width,data=iris)
summary(lm_model2002)

predict(lm_model2002,newdata=iris[1:5,])

output:

> predict(lm_model2002,newdata=iris[1:5,])
       1        2        3        4        5 
5.744459 5.856139 5.811467 5.833803 5.722123 

Or:

df <- head(iris,5)
predict(lm_model2002,newdata=df)

EDIT

After your last comment, to see the change in prediction by changing one of the independent variables

data(iris)

df <- iris[rep(1,5),]
Petal_Length<-seq(1,2,by=0.25)
df$Petal.Length<-Petal_Length

lm_model3204 <- lm(Sepal.Length ~ Petal.Length+Sepal.Width,data=iris)
pred <- predict(lm_model3204,newdata=df)