I know teh question of unnesting a list column in a data frame has been raised and answered multiple times. However, here's the potentially 237. problem of that kind.
I have the following data:
set.seed(666)
dat <- data.frame(sysRespNum = c(1,2,3,4,5,6),
product1 = sqrt(rnorm(6, 20, 5)^2),
product2 = sqrt(rnorm(6, 20, 5)^2),
product3 = sqrt(rnorm(6, 20, 5)^2))
data:
sysRespNum product1 product2 product3
1 1 23.766555 13.46907 24.32327
2 2 30.071773 15.98740 11.39922
3 3 18.224328 11.03880 20.67063
4 4 30.140839 19.78984 19.62087
5 5 8.915628 30.75021 24.29150
6 6 23.791981 11.14885 21.72450
Now, I want to calculate the proportion of each product among the sum of all products, so I want to calculate product1/sum(my three products)
, then the same for product 2 and 3. So I'm expecting three new columns.
I've tried the following:
library(tidyverse)
dat %>%
mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) list(sum_Product = x/sum(x))))
(side question: is there maybe a more straightforward way of mutating this directly without the need to create a list. I now I could create a sum column first and then do a simple mutate along with across. But I'm wondering if teh calculations can be achieved without creating a temporary sum column first)
Now my problem is that it's difficult to unnest the sum_Product
list column. unnest_wider
doesn't work, the sum_Product
column is still a list.
So the only thing that worked for me is
- following this solution: https://stackoverflow.com/a/60824506/2725773
- changing my code above and replace the
list
part bydata.frame
:
full code:
dat %>%
mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) data.frame(sum_Product = x/sum(x)))) %>%
unnest(cols = everything()) %>%
mutate(product = rep(1:3, nrow(.)/3)) %>%
pivot_wider(values_from = sum_Product,
names_from = product,
names_prefix = "share_product")
which gives the correct result:
# A tibble: 6 x 7
sysRespNum product1 product2 product3 share_product1 share_product2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 23.8 13.5 24.3 0.386 0.219
2 2 30.1 16.0 11.4 0.523 0.278
3 3 18.2 11.0 20.7 0.365 0.221
4 4 30.1 19.8 19.6 0.433 0.285
5 5 8.92 30.8 24.3 0.139 0.481
6 6 23.8 11.1 21.7 0.420 0.197
# … with 1 more variable: share_product3 <dbl>
However, it feels unnecessary complicated to unnest everything and then reshape with pivot_wider.
So a) is there a more elegant way to calculate my share variables and b) is there a more elegant/shorter/less verbose way of reshaping a list column into multiple vector columns?
It is easier to do this
rowSums
i.e. divide the 'product1' by therowSums
on the columns that starts with key word 'product'. Instead of doingrowwise
withc_across
, this is vectorized and should be fast as wellNOTE: There is a mixing of
base R
code (apply
) and the tidyverse option withacross
which doesn't seem to be the optimal wayIf we need to do this for all the 'product' columns, create a
sum
column first withmutate
and then useacross
on the columns that starts with 'product' to divide the column by 'Sum_col'-output
Or using
base R