Extracting specific group value from a data.table list and returning as a vector in R

64 Views Asked by At

I have a list of some data, Its looks like:

library(data.table)

dt1 <- data.table(age_group = c("young", "old"), ratio = runif(2))
dt2 <- data.table(age_group = c("young", "old"), ratio = runif(2))

dt_list <- list(
    list(ratio_by_group = dt1),
    list(ratio_by_group = dt2),
    list(ratio_by_group = NA)
)
dt_list
[[1]]
[[1]]$ratio_by_group
   age_group     ratio
1:     young 0.5956572
2:       old 0.5053023


[[2]]
[[2]]$ratio_by_group
   age_group     ratio
1:     young 0.4632962
2:       old 0.2356656


[[3]]
[[3]]$ratio_by_group
[1] NA

I want tp exact the ratio information for a specific group (i.e., 'young') and return it as a vector. I used the following code:

sapply(dt_list, function(x) ifelse(!is.na(x$ratio_by_group), x$ratio_by_group[age_group=='young', ratio], NA))
          [,1]      [,2]
[1,] 0.5956572 0.4632962
[2,] 0.5956572 0.4632962
[3,] 0.5956572 0.4632962
[4,] 0.5956572 0.4632962

My expected output is

[1] "0.5956572" "0.4632962" NA 
2

There are 2 best solutions below

2
Onyambu On BEST ANSWER

use Recursion:

fn <- function(x){
    if(is.data.table(x)) x[age_group == 'young', ratio]
    else if(is.list(x[[1]])) unlist(lapply(x, fn),use.names = FALSE)
}

fn(dt_list)
[1] 0.6224014 0.8436315
0
jay.sf On

You could case-handle the NAs and subset the rest to your requirements.

> sapply(dt_list, \(x) if (all(is.na(x))) NA else unname(unlist(el(x)[age_group == 'young', 'ratio'])))
[1] 0.4314703 0.1398862        NA

Data:

> dput(dt_list)
list(list(ratio_by_group = structure(list(age_group = c("young", 
"old"), ratio = c(0.431470313109457, 0.41186331724748)), row.names = c(NA, 
-2L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x561d15f42ac0>, index = structure(integer(0), "`__age_group`" = 2:1))), 
    list(ratio_by_group = structure(list(age_group = c("young", 
    "old"), ratio = c(0.139886221848428, 0.171009620418772)), row.names = c(NA, 
    -2L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x561d15f42ac0>, index = structure(integer(0), "`__age_group`" = 2:1))), 
    list(ratio_by_group = NA))