Re-arrange rows by longest interval between starttime and endtime

35 Views Asked by At

In this type of data:

df
# A tibble: 9 × 6
     id Utterance             Story        Climax              Starttime_ms Endtime_ms
  <dbl> <chr>                 <chr>        <chr>                      <dbl>      <dbl>
1     4 "yeah"                 NA           NA                        20405      23532
2     5 "Come Home "          "Come Home "  NA                        20405      47677 #<--
3     6 ">last time "          NA           NA                        23818      25110
4     7 "two weeks ago?  "     NA           NA                        25470      26259
5     8 "and X"                NA           NA                        26623      32103
6     9 "and then last night"  NA           NA                        32688      33797
7    10 "are you sure?"        NA           NA                        34099      37542
8    11 "Come Home climax "    NA          "Come Home climax "        34099      39895 #<---
9    12 "=she said  Y"         NA           NA                        38075      39895

I need to re-arrange the rows in such a way that those rows

  • where the interval between Starttime_ms and Endtime_ms is larger than in the previous row AND
  • where the Startttime_ms is the same as in the previous row

are put before that previous row. How can that be done?

The desired output is this:

df
# A tibble: 9 × 6
     id Utterance             Story        Climax              Starttime_ms Endtime_ms
  <dbl> <chr>                 <chr>        <chr>                      <dbl>      <dbl>
2     5 "Come Home "          "Come Home "  NA                        20405      47677
1     4 "yeah"                 NA           NA                        20405      23532
3     6 ">last time "          NA           NA                        23818      25110
4     7 "two weeks ago?  "     NA           NA                        25470      26259
5     8 "and X"                NA           NA                        26623      32103
6     9 "and then last night"  NA           NA                        32688      33797
8    11 "Come Home climax "    NA          "Come Home climax "        34099      39895
7    10 "are you sure?"        NA           NA                        34099      37542
9    12 "=she said  Y"         NA           NA                        38075      39895

Data:

df <- structure(list(id = c(4, 5, 6, 7, 8, 9, 10, 11, 12), Utterance = c("yeah", 
                                                                         "Come Home ", ">last time ", "two weeks ago?  ", 
                                                                         "and X", 
                                                                         "and then last night", "are you sure?", 
                                                                         "Come Home climax ", "=she said  Y"
), Story = c(NA, "Come Home ", NA, NA, NA, NA, NA, NA, NA), Climax = c(NA, 
                                                                       NA, NA, NA, NA, NA, NA, "Come Home climax ", NA), Starttime_ms = c(20405, 
                                                                                                                                          20405, 23818, 25470, 26623, 32688, 34099, 34099, 38075), Endtime_ms = c(23532, 
                                                                                                                                                                                                                  47677, 25110, 26259, 32103, 33797, 37542, 39895, 39895)), row.names = c(NA, 
                                                                                                                                                                                                                                                                                          -9L), class = c("tbl_df", "tbl", "data.frame"))
1

There are 1 best solutions below

2
Ben Bolker On

This is not a convenient format for replication (see ?dput or the reprex package), but something like

library(dplyr)
df_sort <- (df
   |> mutate(dt = endtime - starttime)
   |> arrange(starttime, desc(dt))
)

should work (the first argument to arrange is the primary sort key; later arguments are used as tiebreakers/sorting within primary categories). You can add a select(-dt) if you don't want to keep the time-difference variable around.