My linear model would be Score ~ Age + Collection1 + Collection3
I transform the Collection
Column into dummy variables and I don't have Collection5
column to prevent the dummy variable trap.
For Collection 1, 3, and 5, I am sampling the same people and the Collection period takes place at the same time (Collection 1 = year 2000, Collection 3 = year 2002, Collection 5 = year 2004) hence the +2 in age for the same contacts.
Would the variables Age
, Collection1
, and Collection3
be multicollinear? On one hand, I feel like the increase in age is correlated with a higher collection but since Collection is transformed into multiple dummy variables of 1s and 0s, it shouldn't matter.
Is there a logical explanation as to why it should or shouldn't be multicollinear or break other assumptions?