Is there a way to specifically filter out buses with certain name from GTFS?

83 Views Asked by At

I am dealing with some tricky GTFS from Belgian public transport operator De Lijn, which somehow added belbus (demand-response buses) as a bus route that comes every hour on their GTFS, making some poorly served countryside misleadingly appear as a highly accessible area with excellent public transport connection.

In routes.txt, they are listed as this:

route_id agency_id route_short_name route_long_name route_desc route_type route_url route_color route_text_color
61135 1 460 Belbus Vlaamse Ardennen Belbus Vlaamse Ardennen/Belbus Vlaamse Ardennen 3 FFFFFF 000099

I really want to know how I can filter any routes with "Belbus" in their route_desc or route_long_name.

At first I tried to just find them on Excel, delete them, and save it into routes.txt, but of course it didn't work when I calculated stop-level frequency on ArcGIS, since I suppose it just looks at stop_times.txt and does not check if the data in Routes.txt went missing.

I also used gtfstools to try to filter it by route_type, but it was either take all buses out or not unfortunately.

2

There are 2 best solutions below

0
On BEST ANSWER

{gtfstools} maintainer here.

What I'd do:

library(gtfstools)

path <- "path_to_gtfs.zip"

gtfs <- read_gtfs(path)

# select route ids whose route_long_name includes "Belbus"
selected_routes <- gtfs$routes[grepl("Belbus", route_long_name)]$route_id

# filter them out of the gtfs object
filtered_gtfs <- filter_by_route_id(gtfs, selected_routes, keep = FALSE)
0
On

I recommend that you filter rows using the str_detect function from the stringr package.

library(dplyr)
library(stringr)
df_filtered <- df %>% filter(str_detect(route_long_name, "Belbus") == TRUE)