Let's say we have this question Why are you not happy? and we have 5 answers (1, 2, 3, 4, 5)
s = data.frame(subjects = 1:12,
Why_are_you_not_happy = c(1,2,4,5,1,2,4,3,2,1,3,4))
in the previous example every subject picked only one option. but let's say that each of the subjects 3, 7 and 10 picked more than one option.
- subject 3 : options 1,2,5
- subject 7 : option 3,4
- subject 10 : option 1,5
I want to code the options of this question considering these multiple options for these 3 subjects, while preserving the shape of the dataframe.
The next case is if the dataframe includes 2 questions as follows :
df <- data.frame(subjects = 1:12,
Why_are_you_not_happy =
c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4),
why_are_you_sad =
c("1,2,3",1,2,3,"4,5,3",2,1,4,3,1,1,1) )
How can we making the proper coding for the first and second scenario ? The objective is to apply multiple correspondence analysis (MCA).
Thank you
Edit 1:
With your updated example data you have (at least) two options: you can separate each column, or you can
pivot_longer()the data and group the "scores" together. E.g.This is what I think you should use for MCA, e.g.
Second approach for handling the data that 'works better' for plotting with e.g. ggplot:
Created on 2022-10-06 by the reprex package (v2.0.1)
Original answer:
It sounds like you want the
separate()function from the tidyr package, e.g.Or, perhaps in long format? E.g.
Created on 2022-10-05 by the reprex package (v2.0.1)