Using the answer from a previous question (Structure Query for osmextract ), I want to use the R package osmextract to get a subset of keys/tags (e.g. shop=supermarket) for the whole Europe.osm.pbf from Geofabrik.
I am wondering if it is possible to avoid oe_get() translating ALL features of a layer (e.g. point, line or multipolygon) into a gpkg in the first place and conversely just have it translate a subset of features into a gpkg (e.g. only amenity=school).
The previous response does this in two steps:
- Load the data and put each layer into a gpkg
- Select specific features from a layer within this gpkg with a query
This leads to the production of a possibly huge gpkg in the first step, from which then a selection is made. It would be a saving of processing time and file size if these two steps could be done in one. I am not used to working with databases or similar, thats why I don't know if this kind of procedure is even possible with pbf files or if the translation into a gpkg is necessary to be able to do a query of this sort in the first place.
Some example code of my try with a mid-sized pbf file (Austria):
## STEP 1: (Download and) convert data into gpkg. (NB skipping dl here because file already downloaded)
## Each layer (line, multipolygon, point) will be consecutively added to the gpkg 'geofabrik_austria-latest'
oe_get("Austria", layer = "lines", provider = "geofabrik", force_download = FALSE, extra_tags = c("maxspeed", "oneway"), download_only = TRUE)
# This took about 8 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
oe_get("Austria", layer = "multipolygons", provider = "geofabrik", force_download = FALSE, download_only = TRUE)
# This took about 13 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
oe_get("Austria", layer = "points", provider = "geofabrik", force_download = FALSE, extra_tags = c("amenity", "shop"), download_only = TRUE)
# This took about 2 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
# Total processing time: about 23 mins (Europe.osm.pbf file is 40 times bigger and estimated to take 15.3 hrs)
# gpkg file size 5.9 times bigger than pbf filesize (geofabrik_europe-latest.gpkg filesize is estimated 158.4 GB)
## STEP 2: Manipulate gpkg: Select keys/features
austria_roads_lines_gpkg <- oe_get(
place = "Austria",
layer = "lines",
query = "SELECT * FROM lines WHERE highway IN ('motorway', 'trunk', 'primary', 'secondary', 'tertiary', 'unclassified', 'residential', 'motorway_link', 'trunk_link', 'primary_link', 'secondary_link', 'tertiary_link', 'living_street', 'service')",
quiet = TRUE
)
austria_shops_multipoly_gpkg <- oe_get(
place = "Austria",
layer = "multipolygons",
query = "SELECT * FROM multipolygons WHERE shop IN ('convenience', 'greengrocer', 'supermarket')",
quiet = TRUE
)
austria_shops_points_gpkg <- oe_get(
place = "Austria",
layer = "points",
query = "SELECT * FROM points WHERE shop IN ('convenience', 'greengrocer', 'supermarket')",
quiet = TRUE
)
I actually found a solution myself by applying an SQL-like query within a character vector with the options that will be passed to ogr2ogr during the vectortranslate process. Here is an example code for extracting marketplaces and shops from the point and multipolygon layers of the Europe.osm.pbf from Geofabrik: