Casting the difference between two vectors into a specified format

59 Views Asked by At

I am trying to create code that automatically creates data in the format below (for more background on the colspan see this link).

I need to do two things:

  1. Calculate the "difference" or "distance" for lack of a better word between two vectors.
  2. Somehow cast or transpose the data.

I have no idea where to start, so I was hoping someone could help me get started.

Desired format

lower_threshold <- data.frame(row=c("Lower threshold"), freq = c(0, 25, 50, 100, 250, 500, 1000, 1500, 3000, "Infinity", "SUM"), width=50, colspan=1)

               row  freq width colspan
1  Lower threshold     0    50      1
2  Lower threshold    25    50      1
3  Lower threshold    50    50      1
4  Lower threshold   100    50      1
5  Lower threshold   250    50      1
6  Lower threshold   500    50      1
7  Lower threshold  1000    50      1
8  Lower threshold  1500    50      1
9  Lower threshold  3000    50      1
10 Lower threshold   SUM    50      1

Data to be formatted

The data I have is as follows:

dat <- structure(list(Name = c("Name_a", 
"Name_b"), freq = list(bb = c(5, 4, 3, 2, 14), aa = c(1, 6, 0, 7)), columns_used = list(bb = c(0, 25, 100, 500, "Infinity", "SUM"), aa = c(0, 25, 3000, "Infinity", "SUM"))), row.names = c(NA, 
-2L), class = "data.frame")

enter image description here

Calculating colspan/width

I want to calculate colspan and width (which is simply 50*colspan) based on the difference between the columns_used column and the following vector:

total_colspan = c(0, 25, 50, 100, 250, 500, 1000, 1500, 3000, "Infinity", "SUM")

As an example, for Name_a, the first colspan would be 1 (and width=50), because the first threshold is 0-25 for both vectors. However Name_a skips the number 50, so the next colspan should be 2 (and width=100).

Desired output:

   row         row freq  width colspan
1  Name_a              5    50      1
2  Name_a              4   100      2
3  Name_a              3   100      2
4  Name_a              2   200      4
5  Name_a             14    50      1
6  Name_b              1    50      1
...
11 Name_b              7    50      1

Where the sum of colspan for each Name_x should be 10 in total and row_freq comes from the freq column in dat.

The idea is to use the desired output to create a hmtl table:

enter image description here

1

There are 1 best solutions below

3
On BEST ANSWER

It can certainly be improved, but here's a rough way to do it:

l <- lapply(dat$columns_used, \(y) sapply(y, \(x) which(total_colspan == x) - which(y == x)))

dat[1:2] %>% 
  unnest_longer(freq) %>% 
  mutate(colspan = unlist(lapply(l, \(x) diff(x) + 1)),
         width = colspan * 50)

# A tibble: 9 x 4
  Name    freq colspan width
  <chr>  <dbl>   <dbl> <dbl>
1 Name_a     5       1    50
2 Name_a     4       2   100
3 Name_a     3       2   100
4 Name_a     2       4   200
5 Name_a    14       1    50
6 Name_b     1       1    50
7 Name_b     6       7   350
8 Name_b     0       1    50
9 Name_b     7       1    50

Note: \(x) is a shortcut for function(x).