Create a summarised column that is a list of values

62 Views Asked by At

Here is an example of the data frame I have (it is the "stoptimes" table in a gtfs file):

stoptimes <- data.frame(route = c("route1", "route1", "route1", "route2", "route2", "route2", "route3", "route3", "route3"),
                    stops = c("stop1", "stop2", "stop3", "stop3", "stop2", "stop1", "stop3", "stop4", "stop5"))

I would like to build a data frame (or list) that has the length of the number of distinct stops (5) and associates each stop to a list of all routes that pass on that stop.

How can I build this in R?

For context, later I would like to merge this with the location of each stop, and then create a variable in another data frame that has the number of distinct routes available within a radius of certain other points.

1

There are 1 best solutions below

0
On BEST ANSWER

Looks like the OP wants to summarise the data group_by stops.

stoptimes |> 
    summarise(routes = list(route), .by = stops)

  stops                 routes
1 stop1         route1, route2
2 stop2         route1, route2
3 stop3 route1, route2, route3
4 stop4                 route3
5 stop5                 route3

This will output list-columns for the routes variable. We may want a simpler (although less "tidy") output, with character scalars describing the routes, which can be achieved with paste or toString

stoptimes |> 
    summarise(routes = toString(route), .by = stops)

This simple answer works for the data given. If there are repeating values for the routes, we may have to wrap unique around the routes variable, as in `... routes = list(unique(route)) ...

There is also an option with tidyr::nest, which will create a nested data.frame with individual tibbles for every value of stops:

library(tidyr)

stoptimes_nested <-
    stoptimes |> 
    nest(.key = "routes",
         .by = stops)

stoptimes_nested$routes

[[1]]
# A tibble: 2 × 1
  route 
  <chr> 
1 route1
2 route2

[[2]]
# A tibble: 2 × 1
  route 
  <chr> 
1 route1
2 route2

[[3]]
# A tibble: 3 × 1
  route 
  <chr> 
1 route1
2 route2
3 route3

[[4]]
# A tibble: 1 × 1
  route 
  <chr> 
1 route3

[[5]]
# A tibble: 1 × 1
  route 
  <chr> 
1 route3