How to properly use lm() in R in order to run ANCOVA test?

379 Views Asked by At

I am currently working on a project in which I have to run an ANCOVA test with the data set integrated in R. (Iris)

I am trying to figure out how to set up lm() in order to run this test. I do not want the complete answer, as I really want to learn.

So, basically I need to run an ANCOVA using the dataset iris in r. It is asking me to compare Sepal.Length across all three species while adjusting for Sepal.Width.

I have tried everything and nothing is working for me, and I know it's definitely user error.

I'm fairly new at using r, so please be nice.

I currently have:

fit2 <- lm(Sepal.Length ~ Species + Sepal.Width, data = iris_data) 

I need to multiply the individual Species times Sepal.width on different occasions. I even created different objects so that I may multiply them times the width, but I have gotten countless of errors.

The current objects I have created are these:

setosa     <- iris[iris$Species == "setosa", ]
versicolor <- iris[iris$Species == "versicolor", ]
virginica  <- iris[iris$Species == "virginica", ]`

Please help steer me in the right direction, thank you! No complete answers, I just need to know how to set it up or maybe I'm unaware of a function that will help me out in this situation.

Any help is appreciated. Thank you

1

There are 1 best solutions below

0
On BEST ANSWER

Analysis of covariance includes both factor and continuous variables as independent variables in a linear model.

For the iris data set, we'd run the following:

  lm(Sepal.Length ~ Sepal.Width + Species,data = iris)

So the original code in the OP is indeed the correct way to set up the analysis, but the key thing is that the intercept represents the factor level that's not listed in the output, and the other factor levels are interpreted as differences relative to the species represented by the intercept.

enter image description here

Since the Setosa species isn't listed in the regression coefficients list, it is represented by the intercept term. Therefore, the other species coefficients are interpreted as "the effect of Species = Virginica on sepal length is x relative to Setosa, net of sepal width."

To use the coefficients to predict values of Sepal.Length, if Species = Setosa, one can ignore the coefficients for Virginica and Versicolor (i.e. set their values to 0).

Making predictions

To make predictions with the model, we save the model object and use it with the predict() function.

fit <- lm(Sepal.Length ~ Sepal.Width + Species,data = iris)

# predict some values
# first, set up the independent variables
Species <- c("setosa","setosa","virginica","versicolor","setosa")
Sepal.Width <- c(3.1,3.2,3.8,2.9,3.25)

# next, build a data frame
data <- data.frame(Species,Sepal.Width)

# predict and print 
data$predicted <- predict(fit,data)
data

...and the output:

> data
     Species Sepal.Width predicted
1     setosa        3.10  4.742432
2     setosa        3.20  4.822788
3  virginica        3.80  7.251741
4 versicolor        2.90  6.040463
5     setosa        3.25  4.862966
>