Unlist/unnest list column into several columns

295 Views Asked by At

I know teh question of unnesting a list column in a data frame has been raised and answered multiple times. However, here's the potentially 237. problem of that kind.

I have the following data:

set.seed(666)
dat <- data.frame(sysRespNum = c(1,2,3,4,5,6),
                  product1   = sqrt(rnorm(6, 20, 5)^2),
                  product2   = sqrt(rnorm(6, 20, 5)^2),
                  product3   = sqrt(rnorm(6, 20, 5)^2))

data:

  sysRespNum  product1 product2 product3
1          1 23.766555 13.46907 24.32327
2          2 30.071773 15.98740 11.39922
3          3 18.224328 11.03880 20.67063
4          4 30.140839 19.78984 19.62087
5          5  8.915628 30.75021 24.29150
6          6 23.791981 11.14885 21.72450

Now, I want to calculate the proportion of each product among the sum of all products, so I want to calculate product1/sum(my three products), then the same for product 2 and 3. So I'm expecting three new columns.

I've tried the following:

library(tidyverse)    
dat %>%
  mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) list(sum_Product = x/sum(x))))

(side question: is there maybe a more straightforward way of mutating this directly without the need to create a list. I now I could create a sum column first and then do a simple mutate along with across. But I'm wondering if teh calculations can be achieved without creating a temporary sum column first)

Now my problem is that it's difficult to unnest the sum_Product list column. unnest_wider doesn't work, the sum_Product column is still a list.

So the only thing that worked for me is

full code:

dat %>%
  mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) data.frame(sum_Product = x/sum(x)))) %>%
  unnest(cols = everything()) %>%
  mutate(product = rep(1:3, nrow(.)/3)) %>%
  pivot_wider(values_from = sum_Product,
              names_from = product,
              names_prefix = "share_product")

which gives the correct result:

# A tibble: 6 x 7
  sysRespNum product1 product2 product3 share_product1 share_product2
       <dbl>    <dbl>    <dbl>    <dbl>          <dbl>          <dbl>
1          1    23.8      13.5     24.3          0.386          0.219
2          2    30.1      16.0     11.4          0.523          0.278
3          3    18.2      11.0     20.7          0.365          0.221
4          4    30.1      19.8     19.6          0.433          0.285
5          5     8.92     30.8     24.3          0.139          0.481
6          6    23.8      11.1     21.7          0.420          0.197
# … with 1 more variable: share_product3 <dbl>

However, it feels unnecessary complicated to unnest everything and then reshape with pivot_wider.

So a) is there a more elegant way to calculate my share variables and b) is there a more elegant/shorter/less verbose way of reshaping a list column into multiple vector columns?

3

There are 3 best solutions below

8
akrun On BEST ANSWER

It is easier to do this rowSums i.e. divide the 'product1' by the rowSums on the columns that starts with key word 'product'. Instead of doing rowwise with c_across, this is vectorized and should be fast as well

library(dplyr)
dat %>%
    mutate(sum_product = product1/rowSums(select(., starts_with('product'))))

NOTE: There is a mixing of base R code (apply) and the tidyverse option with across which doesn't seem to be the optimal way


If we need to do this for all the 'product' columns, create a sum column first with mutate and then use across on the columns that starts with 'product' to divide the column by 'Sum_col'

dat %>%
     mutate(Sum_col = rowSums(select(., starts_with('product'))),
           across(starts_with('product'),
        ~ ./Sum_col, .names = '{.col}_sum_product')) %>%
     select(-Sum_col)

-output

#ysRespNum  product1 product2 product3 product1_sum_product product2_sum_product product3_sum_product
#1          1 23.766555 13.46907 24.32327            0.3860783            0.2187998            0.3951219
#2          2 30.071773 15.98740 11.39922            0.5233660            0.2782431            0.1983909
#3          3 18.224328 11.03880 20.67063            0.3649701            0.2210688            0.4139610
#4          4 30.140839 19.78984 19.62087            0.4333597            0.2845348            0.2821054
#5          5  8.915628 30.75021 24.29150            0.1393996            0.4807925            0.3798079
#6          6 23.791981 11.14885 21.72450            0.4198684            0.1967490            0.3833826

Or using base R

nm1 <- startsWith(names(dat), 'product')
dat[paste0('sum_product', seq_along(nm1))] <- dat[nm1]/rowSums(dat[nm1])
0
ThomasIsCoding On

I guess the following base R code should work for you

cbind(
  dat,
  setNames(dat[-1] / rowSums(dat[-1]), paste0("share_product", seq_along(dat[-1])))
)

which gives

  sysRespNum  product1 product2 product3 share_product1 share_product2
1          1 23.766555 13.46907 24.32327      0.3860783      0.2187998
2          2 30.071773 15.98740 11.39922      0.5233660      0.2782431
3          3 18.224328 11.03880 20.67063      0.3649701      0.2210688
4          4 30.140839 19.78984 19.62087      0.4333597      0.2845348
5          5  8.915628 30.75021 24.29150      0.1393996      0.4807925
6          6 23.791981 11.14885 21.72450      0.4198684      0.1967490
  share_product3
1      0.3951219
2      0.1983909
3      0.4139610
4      0.2821054
5      0.3798079
6      0.3833826
0
Gwang-Jin Kim On

Good old plain basic R

rdat <- dat[-1]
rdat <- rdat/rowSums(rdat)
colnames(rdat) <- paste0("share_", colnames(rdat))
cbind(dat, rdat)

Which gives:

  sysRespNum  product1 product2 product3 share_sum_product1 share_sum_product2
1          1 23.766555 13.46907 24.32327          0.3860783          0.2187998
2          2 30.071773 15.98740 11.39922          0.5233660          0.2782431
3          3 18.224328 11.03880 20.67063          0.3649701          0.2210688
4          4 30.140839 19.78984 19.62087          0.4333597          0.2845348
5          5  8.915628 30.75021 24.29150          0.1393996          0.4807925
6          6 23.791981 11.14885 21.72450          0.4198684          0.1967490
  share_sum_product3
1          0.3951219
2          0.1983909
3          0.4139610
4          0.2821054
5          0.3798079
6          0.3833826