Counting occurrence of diagnosis code across multiple columns in large R dataset

48 Views Asked by At

I'm using two years of NIS data (already combined) to search for a diagnosis code across all of the DX columns. The columns start at I10_DX1 to I10_DX40 (which are column #18-57). I want to create a new dataset that has the observations that has this diagnosis code in any of these columns.

I 've tried loops and the ICD packages but haven't been able to get it right. Most recently tried code as follows:

       get_icd_labels(icd3 = c("J80"), year = 2018:2019) %>%
       arrange(year, icd_sub) %>% 
       filter(icd_sub %in% c("J80") %>% 
       select(year, icd_normcode, label) %>% 
       knitr::kable(row.names = FALSE)
1

There are 1 best solutions below

0
On

This is a tidyverse (dplyr) solution. If you don't already have a unique id for each record, I'd start out by adding one.

df <-
  df %>%
  mutate(my_id = row_number())

Next, I'd gather the diagnosis codes into a table where each record is a single diagnosis.

diagnoses <-
  df %>%
  select(my_id, 18:57) %>%
  gather("diag_num","diag_code",2:ncol(.)) %>%
  filter(!is.na(diag_code)) #No need to keep a bunch of empty rows

Finally, I would join my original df to the diagnoses data frame and filter for the code I want.

df %>%
  inner_join(diagnoses, by = "my_id") %>%
  filter(diag_code == "J80")