How to properly use pivot_wider() to align the values of two variables?

32 Views Asked by At

I have one dataset as below.

library(dplyr)
library(tidyr)

df= tibble::tibble(
    variety=rep(c("CV1", "CV2", "CV3"), each=16L),
    irrigation=rep(rep(c("yes", "no"), 3), each=8L),
    fertilizer=rep(rep(c("Organic", "Urea"), 6), each=4L),
    reps=c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 3, 4, 3, 1, 3, 4, 3, 2, 1, 4,
    2, 3, 1, 4, 1, 3, 1, 2, 2, 4, 1, 2, 1, 4, 2, 3, 1, 4, 2, 3, 2, 4),
    yield=c(8.379842, 8.058658, 9.73285, 9.224371999999999, NA, 6.996108000000001,
    9.865782, 7.112071666666666, 5.968758, 8.976471666666667, 7.980724, 9.35065,
    5.5111574999999995, 6.998728, 6.164252, 5.118412857142857, 7.748125, 8.58071,
    NA, NA, 7.673354999999999, 7.91948, NA, NA, 11.190445, 8.463484999999999,
    9.61818, 10.89841, 7.83943, 8.44905, 9.844165, 9.98026, 10.130675, 9.59432,
    NA, NA, 9.502525, 9.216965, NA, NA, 7.807259999999999, 9.94434, 7.92808,
    11.88664, 10.700185000000001, 10.723835000000001, 11.363140000000001,
    11.846934999999998),
    nutrients=c(0.42549600000000004, 0.417924, 0.47264, 0.45002, NA, 0.381154, 0.484084,
    0.3597316666666666, 0.32555, 0.45681666666666665, 0.38164600000000004,
    0.456822, 0.30655, 0.363892, 0.350876, 0.30200857142857146, 0.26754,
    0.30954499999999996, NA, NA, 0.328395, 0.30893, NA, NA, 0.37877, 0.33532,
    0.40417000000000003, 0.4581, 0.32077500000000003, 0.33331500000000003,
    0.39925, 0.40179000000000004, 0.40585499999999997, 0.339465, NA, NA, 0.339545,
    0.34077500000000005, NA, NA, 0.3227, 0.37770000000000004, 0.34663, 0.48564,
    0.43601500000000004, 0.38200500000000004, 0.47248500000000004, 0.506255),
)
head(df,5)
variety irrigation  fertilizer  reps  yield     nutrients
CV1     yes         Organic     1     8.379842  0.425496
CV1     yes         Organic     2     8.058658  0.417924
CV1     yes         Organic     3     9.732850  0.472640
CV1     yes         Organic     4     9.224372  0.450020
CV1     yes         Urea        1     NA        NA
.
.
.

I want to align the yields for Organic fertilizer with those for Urea fertilizer, as well as the nutrients between the two fertilizers to create regression graph between two different yield and nutrients. Initially, I attempted to use pivot_wider().

df2= data.frame(df %>%
                group_by(variety, irrigation) %>%
                pivot_wider(names_from=fertilizer, values_from=nutrients))
head(df2,8)
variety irrigation  reps     nutrients  Organic   Urea
1   CV1     yes         1    0.4254960  8.379842  NA
2   CV1     yes         2    0.4179240  8.058658  NA
3   CV1     yes         3    0.4726400  9.732850  NA
4   CV1     yes         4    0.4500200  9.224372  NA
5   CV1     yes         1    NA         NA        NA
6   CV1     yes         2    0.3811540  NA        6.996108
7   CV1     yes         3    0.4840840  NA        9.865782
8   CV1     yes         4    0.3597317  NA        7.112072
    .
    .
    .

Currently, the yields for Organic and Urea are not aligned next to each other. What I'm aiming for is a layout like the following.

variety irrigation  reps     nutrients  Organic   Urea
1   CV1     yes         1    0.4254960  8.379842  NA
2   CV1     yes         2    0.4179240  8.058658  6.996108
3   CV1     yes         3    0.4726400  9.732850  9.865782
4   CV1     yes         4    0.4500200  9.224372  7.112072
    .
    .
    .

How can I solve this problem? Also, is there any way to transpose both yield and nutrients at once?

Thanks,

2

There are 2 best solutions below

0
Jon Spring On BEST ANSWER

Here's my best guess. It looks like in a few cases you have multiple observations with the same variety/irrigation/reps/fertilizer. To address that, I add a variable obs to distinguish these.

df |>
  mutate(obs = row_number(), .by = c(variety, irrigation, reps, fertilizer)) %>%
  pivot_wider(names_from = fertilizer, values_from = c(yield, nutrients))

Result

# A tibble: 28 × 8
   variety irrigation  reps   obs yield_Organic yield_Urea nutrients_Organic nutrients_Urea
   <chr>   <chr>      <dbl> <int>         <dbl>      <dbl>             <dbl>          <dbl>
 1 CV1     yes            1     1          8.38      NA                0.425         NA    
 2 CV1     yes            2     1          8.06       7.00             0.418          0.381
 3 CV1     yes            3     1          9.73       9.87             0.473          0.484
 4 CV1     yes            4     1          9.22       7.11             0.450          0.360
 5 CV1     no             1     1          5.97       5.51             0.326          0.307
 6 CV1     no             2     1          8.98       7.00             0.457          0.364
 7 CV1     no             3     1          7.98       6.16             0.382          0.351
 8 CV1     no             4     1          9.35       5.12             0.457          0.302
 9 CV2     yes            3     1          7.75       7.67             0.268          0.328
10 CV2     yes            4     1          8.58       7.92             0.310          0.309
# ℹ 18 more rows
0
user12425014 On

I do not suspect that pivot_wider is proper function. I solved this in that way:

wu <- filter(df, fertilizer != 'Urea')
hu <- filter(df, fertilizer == 'Urea')
fr <- full_join(wu, hu, by=c("variety"="variety", "irrigation"="irrigation", "reps"="reps"), suffix=c("a","b"))
fr2 <- select(fr, "variety", "irrigation", "reps", "yielda", "nutrientsa", "yieldb")
final <- rename(fr2, `Organic`=`yielda`,`nutrients`=`nutrientsa`,`Urea`=`yieldb`)

Result:

> final
# A tibble: 30 × 6
   variety irrigation  reps Organic nutrients  Urea
   <chr>   <chr>      <dbl>   <dbl>     <dbl> <dbl>
 1 CV1     yes            1    8.38     0.425 NA   
 2 CV1     yes            2    8.06     0.418  7.00
 3 CV1     yes            3    9.73     0.473  9.87
 4 CV1     yes            4    9.22     0.450  7.11
 5 CV1     no             1    5.97     0.326  5.51
 6 CV1     no             2    8.98     0.457  7.00
 7 CV1     no             3    7.98     0.382  6.16
 8 CV1     no             4    9.35     0.457  5.12
 9 CV2     yes            3    7.75     0.268  7.67
10 CV2     yes            3    7.75     0.268 NA   
# … with 20 more rows