pivoting data frame using {collapse} package

73 Views Asked by At

I'm curious how to pivot a long data frame wide specifically using the {collapse} package. I like the performance aspect of the package, but I'm finding it hard to use at times for more mid-level data manipulation (e.g., tidyr::pivot_wider() )

An example would be taking this tibble:

tbl <- tibble::tibble(
    user_id = rep(1:3, each = 5),
    a =  rep(paste0("item", 1:5), times = 3),
    b =  sample(rnorm(1000), 15)
)

# A tibble: 15 x 3
   user_id a            b
     <int> <chr>    <dbl>
 1       1 item1  0.474  
 2       1 item2  0.658  
 3       1 item3 -0.609  
 4       1 item4 -0.710  
 5       1 item5 -0.936  
 6       2 item1 -1.06   
 7       2 item2 -0.307  
 8       2 item3 -1.69   
 9       2 item4  0.669  
10       2 item5  0.776  
11       3 item1 -0.00244
12       3 item2  1.33   
13       3 item3 -0.724  
14       3 item4 -0.646  
15       3 item5  1.69 

And turning it into this using {collapse}:

tbl |> tidyr::pivot_wider(names_from="a", values_from="b")

# A tibble: 3 x 6
  user_id  item1  item2  item3 item4   item5
    <int>  <dbl>  <dbl>  <dbl> <dbl>   <dbl>
1       1 -0.597  0.672 -0.396 1.44  -0.419 
2       2  1.56   0.488 -0.980 0.648 -0.0903
3       3 -0.885 -0.675  0.376 1.02  -0.180 
1

There are 1 best solutions below

1
Sebastian On

{collapse} 2.0+ has received a very powerful pivot() function:

library(collapse)
#> collapse 2.0.3, see ?`collapse-package` or ?`collapse-documentation`

tbl <- tibble::tibble(
  user_id = rep(1:3, each = 5),
  a =  rep(paste0("item", 1:5), times = 3),
  b =  sample(rnorm(1000), 15)
)

pivot(tbl, "user_id", "b", "a", how = "w")
#> # A tibble: 3 × 6
#>   user_id  item1 item2  item3  item4  item5
#>     <int>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1       1  1.62  0.986  0.645 -0.605  0.501
#> 2       2 -0.667 1.15   0.963  0.435  1.28 
#> 3       3 -0.720 0.727 -0.129 -0.977 -0.791
# Or
pivot(tbl, "user_id", names = "a", how = "w")
#> # A tibble: 3 × 6
#>   user_id  item1 item2  item3  item4  item5
#>     <int>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1       1  1.62  0.986  0.645 -0.605  0.501
#> 2       2 -0.667 1.15   0.963  0.435  1.28 
#> 3       3 -0.720 0.727 -0.129 -0.977 -0.791
# Or
pivot(tbl, values = "b", names = "a", how = "w")
#> # A tibble: 3 × 6
#>   user_id  item1 item2  item3  item4  item5
#>     <int>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1       1  1.62  0.986  0.645 -0.605  0.501
#> 2       2 -0.667 1.15   0.963  0.435  1.28 
#> 3       3 -0.720 0.727 -0.129 -0.977 -0.791

Created on 2023-10-24 with reprex v2.0.2