Let's say I work in a psychological context and I'm wondering how many risk factors a patient has. After that, I would like to list all the risks and then discover the most prevalent risk (mode). I'm thinking on use mutate and then paste0 and get the colname if the value of the row is "risk". However, I'm having a hard time with that.
any help is appreaciated.
Code is below:
library(tidyverse)
df = data.frame(
patient = seq(1:60),
cancer = c("risk","ok"),
blood_pres = c("risk", "ok"),
low_education = c("risk","ok")
)
df = df %>% mutate(how_many_risks =
rowSums(. == "risk"))

Let's come up with some more interesting data.
From here, we'll pivot, summarize, then join back onto the original data.
(Note that
dplyr_1.1.0or newer is required to use.by=. If you have an older dplyr and will not update, shift to usinggroup_by(patient)instead of.by=patient.)Something you may want to consider: unless this is solely for presentation tables, it is occasionally advantageous to have
risksas a list-column instead of a comma-delimited string. To do this, just replacetoStringwithlist, and while it may render the same on the console, it will allow things like set-ops on it (though normal column/vector operations may not work as you expect):If this data were a tibble (
tbl_df) instead, the same data would present asWe can do things directly such as check the lengths of each row in that column; or check quickly for exact set-membership:
Granted, both of those can be done with regex, but .. if the names have any ambiguity, regex carries a little overhead.