Let's say I have a data frame df as follows:
df <- data.frame(type = c("A","B","AB","O","O","B","A"))
Obviously there are 4 kinds of type. However, in my actual data, I don't know how many kinds are in a column type. The number of dummy variables should be one less than the number of kinds in type. In this example, number of dummy variables should be 3. My expected output looks like this:
df <- data.frame(type = c("A","B","AB","O","O","B","A"),
A = c(1,0,0,0,0,0,1),
B = c(0,1,0,0,0,1,0),
AB = c(0,0,1,0,0,0,0))
Here I used A, B and AB as dummy variables, but whatever I choose from type doesn't matter. Even if I don't know the values of type and the number of kinds, I somehow want to make it as dummy variables.
This is treatment contrasts coding. First, you need a factor variable.
Now, apply treatment contrasts coding.
Finally you want to have nice row/column names for readability.
The resulting
mlooks like:This is a matrix. If you want a data frame, do
data.frame(m).