Get the correspondent values based on variables names stored in another dataframe?

161 Views Asked by At

In one tibble I have values that are variable names in another tibble. I need to list the values in a specific row in the second tibble of the column names contained in each row of the first tibble in a fast way. How can I do it?

My data has the form:

# Observations in is tibble says which variables of the tibble_b occurred at each case:
tibble_a <- tribble(
        ~a, ~b, ~c,
         4,  2,  5,
         4,  3,  1,
         2,  5,  5
        )
# This tibble contains values for the variables at different times:
tibble_b <- tribble(
        ~`1`, ~`2`, ~`3`, ~`4`, ~`5`,
         100,  112,   98,  107,   85,
          71,   76,   97,   80,   15,
           2,   61,   54,   37,   42
         )

I need to create a tibble that contain the values in tibble_b for the numbers in the first one, for, let's say, the first row, for all the rows in tibble_a. That is, basically, to obtain this tabular data:

a b c
107 112 85
107 98 100
112 85 85

My problem is, tibble_a has hundreds of thousands of rows, and, of course, both has more columns. Since I have to perform tests and comparisons on multiple samples of row indexes of tibble_b, using something like below is very time consuming.

for(current_row in 1:nrow(tibble_a)) { 
  for(current_col in 1:3) {
   tibble_of_results[[current_row, current_col]] <- tibble_b[[1, tibble_a[[current_row, current_col]]]] 
  }
}

Can someone suggest any efficient way to accomplish this task?

4

There are 4 best solutions below

0
LMc On BEST ANSWER
library(dplyr)

tibble_a |>
  mutate(across(everything(), \(x) unlist(tibble_b[1, x], use.names = F)))
#       a     b     c
#   <dbl> <dbl> <dbl>
# 1   107   112    85
# 2   107    98   100
# 3   112    85    85

You can use the values of the column to column index tibble_b. For example, for tibble_a[[1]]:

tibble_b[1, c(4, 4, 2)]
#     `4`   `4`   `2`
#   <dbl> <dbl> <dbl>
# 1   107   107   112

We then simply convert that into a vector using unlist. Then we can apply this approach across all columns using across(everything(), ...).

0
TarJae On

Here is functional programming approach using purrr package:

fetch_values <- function(row) {
  map_dbl(tibble_a[row, ], ~tibble_b[[1, .x]])
}

map_dfr(seq_len(nrow(tibble_a)), fetch_values)

 A tibble: 3 × 3
      a     b     c
  <dbl> <dbl> <dbl>
1   107   112    85
2   107    98   100
3   112    85    85
0
M-- On
library(dplyr)
library(tidyr)

tibble_a %>% 
  tibble::rownames_to_column("id") %>% 
  pivot_longer(-id, values_to = "var") %>% 
  right_join(., 
             {tibble_b[1,] %>% pivot_longer(everything(), 
                                names_transform = as.numeric, 
                                names_to = "var")})  %>% 
  pivot_wider(id_cols = -var)

#> Joining with `by = join_by(var)`
#> # A tibble: 3 × 4
#>   id        a     b     c
#>   <chr> <dbl> <dbl> <dbl>
#> 1 1       107   112    85
#> 2 2       107    98   100
#> 3 3       112    85    85

Created on 2023-12-28 with reprex v2.0.2

3
Friede On

If it's just about the first row of tibble_b (not sure) we can do

quickdf = \(l) {
  class(l) = "data.frame"
  attr(l, "row.names") = .set_row_names(length(l[[1L]]))
  l
}
tibble_a = quickdf(tibble_a) # tibble_a = as.data.frame(tibble_a) 
tibble_a[] = as.numeric(tibble_b[1L, ]
                 [match(unlist(tibble_a, use.names = FALSE), colnames(tibble_b))])

giving

> tibble_a
    a   b   c
1 107 112  85
2 107  98 100
3 112  85  85

Read more about quickdf() Chapter 24.4.2.