I once saw a linear model fitting written as follows:
lm(formula = Ozone ~ Solar.R + Wind + Temp + I(Wind^2) + I(Temp^2) +
I(Wind * Temp) + I(Wind * Temp^2) + I(Temp * Wind^2) + I(Temp^2 *
Wind^2), data = airquality)
I am not sure what does I( )
mean here? Or for example, what does I(Wind * Temp^2)
here. can I write it as Wind:Temp^2
?
The
I()
notation in the formula syntax in R means 'as is' i.e.I(a+b)
simply means add the variable a+b as a predictor in the lm model. In your caseI(Wind * Temp^2)
means include as a predictor variable the product of Wind and Temp squared. TheI()
function is used so that there is no confusion with the operators of the formula syntax.For more info page 2 here explains it in full detail.
Hope this is clear!
UPDATE I just want to add Hong Ooi's very good comment on this:
I(Wind * Temp^2)
is not the same as Wind:Temp^2The
^n
operator in formula syntax means 'include these variables and all interactions up to n way'. For exampleY ~ (X + Z + W)^2
is equivalent toY ~ X + Z + W + X:Z + X:W + Z:W
So, in our case
Wind:Temp^2
means justWind:Temp
Small illustration: