For some reason I expected that different data types (numeric, character, factor) produce different results in a simple regression. I build a minimal data example, and was surprised to see that there's no difference.
set.seed(1)
num <- sample(c(0,1), 10, replace=TRUE, prob=c(0.5, 0.5) )
fact <- factor(num, levels = c(0, 1))
char <- ifelse(num==0, "no", "yes")
y <- sample(seq(0,10), 10, replace=TRUE)
df <- data.frame(y, num, fact, char)
str(df)
lm(y ~ num, data=df) # Y = 5.5 + 0.5 num
lm(y ~ char, data=df) # Y = 5.5 + 0.5 char
lm(y ~ fact, data=df) # Y = 5.5 + 0.5 fact
Question: Under what circumstances might this result in a problem? Under what circumstances can it become necessary to transform one variable type into another?