How to enter a linear model with many factors

600 Views Asked by At

It may be a basic question, but I could not seem to find a solution anywhere. If we have a data frame with 100 factors (call them a1 to a100), how could a linear model be entered in R? I understand you could write

lm(y~ a1*...*a100)

but if the names are long, it would take a long time to write them all out. Is there a faster way? For example, by referencing columns or something similar? Somewhat related, if I get a data table with a column name that involves parentheses (e.g. y-max()), how could I enter that? It reads as a function in R, but it is not.

I apologize if this has already been asked, but I could not seem to find an answer.

Thank you all in advance

---Edit---

Thank you for the answers. However, if I did want higher-order interaction terms, how would I accomplish that? Would I need to write a script or is there a smarter way?

2

There are 2 best solutions below

0
On

if you want to include all others y~. is enough, but if you want some selected vars, lets say, var 2 to 50, 52-100. you can do something like this?

vars<-names(df)[c(2:50,52:101)] #or whatever..
covs<-paste(vars, collapse="+")
model<-paste("y~",covs)
df.lm<-lm(as.formula(model), data=df)
0
On

Many of these things should be possible to figure out by reading the Introduction to R manual that comes with R when you download it.

Generally, a factor with many levels is stored as a single variable:

treat <- c("control", "placebo", "placebo", "control", "drugA", "control", 
           "drugB", ...)

If so, you can just use lm(y~treat), and R will handle this for you. On the other hand, if you have a data frame with y and a1 through a100 only, then you can use lm(y~., my.data), and R will take care of that for you also.