Collapse rows that match in more than one column

Question

Collapse rows that match in more than one column

64 Views Asked by raffaelluca At 06 March 2024 at 09:28

I have a data frame where in the first column I have a Patient ID and in the second column I have a Date (date time) when a specific test was taken. The columns thereafter are columns of the specific tests (b, a, c, d). At the moment, for every Patient ID and every Date I have a row with the results of the specific test and NA in the columns of all other tests.

I want to collapse the data frame in a way that whenever Patient ID and Date match, the test results are collapsed into one row only.

I have tried aggregate(), collapse(), and group_by(), but did not manage to do this. The functions are only activated when Patient ID and Date match, not just one of the columns.

Original Q&A

There are 2 best solutions below

**Friede** · Answer 1 · 2024-03-06T09:59:08.450000

Data copied from picture:

X = read.table(text = "PatientID Date b a c d
               1 2024-03-0420:16:11 0.4 NA NA NA 
               1 2024-03-0422:40:11 NA TRUE NA NA 
               1 2024-03-0522:40:11 NA NA FALSE NA
               1 2024-03-0622:40:11 NA NA NA Strong
               2 2024-03-0422:40:11 NA NA NA Weak
               2 2024-03-0522:40:11 NA NA FALSE NA
               3 2024-03-0622:40:11 0.6 NA NA NA
               3 2024-03-0622:40:11 NA FALSE NA NA", header = TRUE)

Extract Date (please specify tz (time zone)). You probably need to adapt format. This step is needed due the bad data structure (I needed to copy the data from picture). Then aggregate:

X$Date = as.Date(as.POSIXct(X$Date, format = "%Y-%m-%d%H:%M:%S"), tz = "")
aggregate(. ~ PatientID + Date, data = X, FUN = na.omit, na.action = "na.pass")

gives

  PatientID       Date   b     a     c      d
1         1 2024-03-04 0.4  TRUE             
2         2 2024-03-04                   Weak
3         1 2024-03-05           FALSE       
4         2 2024-03-05           FALSE       
5         1 2024-03-06                 Strong
6         3 2024-03-06 0.6 FALSE

If you don't like the order, change it. E.g. PatientID in ascending order as the raw data (X) is organised:

> Y = aggregate(. ~ PatientID + Date, data = X, FUN = na.omit, na.action = "na.pass")
> Y[order(Y$PatientID), ]
  PatientID       Date   b     a     c      d
1         1 2024-03-04 0.4  TRUE             
3         1 2024-03-05           FALSE       
5         1 2024-03-06                 Strong
2         2 2024-03-04                   Weak
4         2 2024-03-05           FALSE       
6         3 2024-03-06 0.6 FALSE

**Chris Ruehlemann** · Answer 2 · 2024-03-06T10:37:30.547000

Not exactly sure what you need; here's a tidyverse solution:

X %>%
  group_by(PatientID, Date) %>%
  summarise(
            a = str_c(a[!is.na(a)], collapse = ","),
            b = str_c(b[!is.na(b)], collapse = ","),
            c = str_c(c[!is.na(c)], collapse = ","),
            d = str_c(d[!is.na(d)], collapse = ",")) %>%
  mutate(across(a:d, ~na_if(., "")))
# A tibble: 7 × 6
# Groups:   PatientID [3]
  PatientID Date               a     b     c     d     
      <int> <chr>              <chr> <chr> <chr> <chr> 
1         1 2024-03-0420:16:11 NA    0.4   NA    NA    
2         1 2024-03-0422:40:11 TRUE  NA    NA    NA    
3         1 2024-03-0522:40:11 NA    NA    FALSE NA    
4         1 2024-03-0622:40:11 NA    NA    NA    Strong
5         2 2024-03-0422:40:11 NA    NA    NA    Weak  
6         2 2024-03-0522:40:11 NA    NA    FALSE NA    
7         3 2024-03-0622:40:11 FALSE 0.6   NA    NA

Data (thks to @Friede):

X = read.table(text = "PatientID Date b a c d
               1 2024-03-0420:16:11 0.4 NA NA NA 
               1 2024-03-0422:40:11 NA TRUE NA NA 
               1 2024-03-0522:40:11 NA NA FALSE NA
               1 2024-03-0622:40:11 NA NA NA Strong
               2 2024-03-0422:40:11 NA NA NA Weak
               2 2024-03-0522:40:11 NA NA FALSE NA
               3 2024-03-0622:40:11 0.6 NA NA NA
               3 2024-03-0622:40:11 NA FALSE NA NA", header = TRUE)

Collapse rows that match in more than one column

There are 2 best solutions below

Related Questions in R

Related Questions in AGGREGATE

Related Questions in R-COLLAPSE

Trending Questions

Popular # Hahtags

Popular Questions