I have this data from Alzheimer disease patients cohort. I would like to create a summarized table (or contingency table) to show all information in this table. This is what I would like to see in this cohort: how many males and female, average age of onset, average age at last visit, average age at death, number of samples (IID) with apoe4any. What should be my approach to create such table in R?
dat <- structure(list(IID = structure(1:10, .Names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"), .Label = c("NACC000875",
"NACC003779", "NACC006805", "NACC008215", "NACC010067", "NACC010592",
"NACC011413", "NACC015383", "NACC017476", "NACC017538"), class = "factor"),
cohort = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "ADC8_AA", class = "factor"), sex = structure(c(`1` = 2L,
`2` = 2L, `3` = 2L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 1L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("1", "2"), class = "factor"),
status = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 2L, `6` = 1L, `7` = 2L, `8` = 1L, `9` = 2L, `10` = 2L
), .Label = c("1", "2"), class = "factor"), Race = structure(c(`1` = 1L,
`2` = 1L, `3` = 1L, `4` = 1L, `5` = 1L, `6` = 1L, `7` = 1L,
`8` = 1L, `9` = 1L, `10` = 1L), .Label = "2", class = "factor"),
Ethnicity = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "0", class = "factor"), age_onset = structure(c(NA,
NA, NA, NA, 1L, NA, 4L, NA, 2L, 3L), .Label = c(" 63", " 67",
" 71", " 79", "888"), class = "factor"), age_last_visit = structure(c(`1` = 6L,
`2` = 4L, `3` = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 8L,
`8` = 7L, `9` = 1L, `10` = 5L), .Label = c("70", "71", "74",
"77", "78", "82", "86", "89"), class = "factor"), age_death = structure(c(NA,
NA, NA, 1L, NA, NA, 3L, 2L, NA, NA), .Label = c(" 72", " 88",
" 90", "888"), class = "factor"), apoe4any = structure(c(`1` = 1L,
`2` = 2L, `3` = 1L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 2L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("0", "1"), class = "factor")), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
R uses
factor
class for categorical data. If you change your ages (which are currently factors) tonumeric
, thensummary(dat)
will give you most of what you want.See this common FAQ for explanation of my factor to numeric conversion.
You can also subset the data if you only want to summarize the columns you mention: