Proper way to iterate into dataframe columns and obtain a paired list of values (col1->col2, col2->col3, col3-> col4, etc.)

60 Views Asked by At

Situation I have a data frame where each row expresses a change in values over time: in each cell of the data frame, values may or may not change columns after columns, over a potentially large number of col'.

Goal The final objective is to obtain a complete sequence of pairs of values that express these change over the time, from one col to the adjacent one (i.e. col1 -> col2, col2 -> col3, col3 -> col4, etc.).

For now, I iterate with a for loop through the data frame columns and select 2 successive col', rbind these values together, and filter the result if I needed (see minimal example here-after).

Question Is there a better solution than my for loop, or even a function dedicated to iterate into col'?

Current Approach

# Fake-dataframe ↓
test = data.frame(var1 = 2:5, 
  var2 = c(1, 3:5),
  var3 = c(1,3,4 , 8), 
  var4 = c(2:4, 8)
)

require(tidyverse)
cn <- colnames(test)
graph = data.frame(from = NA, to = NA, name = NA)
# Iterate into 'test' col and construct paired list of value ↓
for (i in 1:(ncol(test) - 1)) {
   graph  <- rbind(graph,    
                  select(test,
                             from = i, 
                              to = i + 1
                         ) %>%
                    mutate(name = cn[i+1])        
                ) 
}
graph <- na.omit(graph)
# Then I'll use some filter if I want to track change, e.g., filter(graph, from != to)

`

3

There are 3 best solutions below

4
ThomasIsCoding On BEST ANSWER

You can try Map

`row.names<-`(
   do.call(
      rbind,
      Map(
         \(x, y, z) data.frame(from = x, to = y, name = z),
         test[-length(test)],
         test[-1],
         names(test)[-1]
      )
   ), NULL
)

or even more straightforward (thanks for the option from @thelatemail)

data.frame(
   from = unlist(test[-length(test)], use.names = FALSE),
   to = unlist(test[-1], use.names = FALSE),
   name = rep(names(test[-1]), each = nrow(test))
)

which gives

   from to name
1     2  1 var2
2     3  3 var2
3     4  4 var2
4     5  5 var2
5     1  1 var3
6     3  3 var3
7     4  4 var3
8     5  8 var3
9     1  2 var4
10    3  3 var4
11    4  4 var4
12    8  8 var4
0
Friede On

Base R, using your loop approach (seq_len(ncol(test)-1L)) in lapply(, \(i) ..:

do.call(rbind, 
        lapply(seq_len(ncol(test)-1L), \(i) { 
          x = test[c(i, i+1L)] 
          x = transform(x, name = names(x)[[2L]]) 
          setNames(x, c("from", "to", "name")) }
          )
        )

which gives

   from to name
1     2  1 var2
2     3  3 var2
3     4  4 var2
4     5  5 var2
5     1  1 var3
6     3  3 var3
7     4  4 var3
8     5  8 var3
9     1  2 var4
10    3  3 var4
11    4  4 var4
12    8  8 var4
0
Adriano Mello On

It´s possible with just some wrangle:

library(tidyverse) 

test %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id, names_to = "name", values_to = "to") %>% 
  mutate(.by = id, from = dplyr::lag(to, default = NA_real_)) %>% 
  filter(!is.na(from)) %>% 
  arrange(name) %>% 
  select(from, to, name)

# ----------------------

# A tibble: 12 × 3
    from    to name 
   <dbl> <dbl> <chr>
 1     2     1 var2 
 2     3     3 var2 
 3     4     4 var2 
 4     5     5 var2 
 5     1     1 var3 
 6     3     3 var3 
 7     4     4 var3 
 8     5     8 var3 
 9     1     2 var4 
10     3     3 var4 
11     4     4 var4 
12     8     8 var4