How to properly use lm() in R in order to run ANCOVA test?

Question

How to properly use lm() in R in order to run ANCOVA test?

385 Views Asked by Y_m5437 At 16 June 2025 at 10:24

I am currently working on a project in which I have to run an ANCOVA test with the data set integrated in R. (Iris)

I am trying to figure out how to set up lm() in order to run this test. I do not want the complete answer, as I really want to learn.

So, basically I need to run an ANCOVA using the dataset iris in r. It is asking me to compare Sepal.Length across all three species while adjusting for Sepal.Width.

I have tried everything and nothing is working for me, and I know it's definitely user error.

I'm fairly new at using r, so please be nice.

I currently have:

fit2 <- lm(Sepal.Length ~ Species + Sepal.Width, data = iris_data)

I need to multiply the individual Species times Sepal.width on different occasions. I even created different objects so that I may multiply them times the width, but I have gotten countless of errors.

The current objects I have created are these:

setosa     <- iris[iris$Species == "setosa", ]
versicolor <- iris[iris$Species == "versicolor", ]
virginica  <- iris[iris$Species == "virginica", ]`

Please help steer me in the right direction, thank you! No complete answers, I just need to know how to set it up or maybe I'm unaware of a function that will help me out in this situation.

Any help is appreciated. Thank you

Original Q&A

There are 1 best solutions below

**Len Greski** · Accepted Answer

Analysis of covariance includes both factor and continuous variables as independent variables in a linear model.

For the iris data set, we'd run the following:

  lm(Sepal.Length ~ Sepal.Width + Species,data = iris)

So the original code in the OP is indeed the correct way to set up the analysis, but the key thing is that the intercept represents the factor level that's not listed in the output, and the other factor levels are interpreted as differences relative to the species represented by the intercept.

Since the Setosa species isn't listed in the regression coefficients list, it is represented by the intercept term. Therefore, the other species coefficients are interpreted as "the effect of Species = Virginica on sepal length is x relative to Setosa, net of sepal width."

To use the coefficients to predict values of Sepal.Length, if Species = Setosa, one can ignore the coefficients for Virginica and Versicolor (i.e. set their values to 0).

Making predictions

To make predictions with the model, we save the model object and use it with the predict() function.

fit <- lm(Sepal.Length ~ Sepal.Width + Species,data = iris)

# predict some values
# first, set up the independent variables
Species <- c("setosa","setosa","virginica","versicolor","setosa")
Sepal.Width <- c(3.1,3.2,3.8,2.9,3.25)

# next, build a data frame
data <- data.frame(Species,Sepal.Width)

# predict and print 
data$predicted <- predict(fit,data)
data

...and the output:

> data
     Species Sepal.Width predicted
1     setosa        3.10  4.742432
2     setosa        3.20  4.822788
3  virginica        3.80  7.251741
4 versicolor        2.90  6.040463
5     setosa        3.25  4.862966
>

How to properly use lm() in R in order to run ANCOVA test?

There are 1 best solutions below

Making predictions

Related Questions in R

Related Questions in STATISTICS

Related Questions in ANALYSIS

Related Questions in ANCOVA

Trending Questions

Popular # Hahtags

Popular Questions