I have a spatial data set and I would like to read to memory only those rows that match a given value in a certain column. Here is a reproducible example with what I've tried.
Using a shape file, I can read the entire data set, but the code throws and error when I try to filter by column value. It looks like it does not recognize the column name
# read entire data set
shp_file <- system.file("shape/nc.shp", package="sf")
data <- read_sf(shp_file, query = "SELECT * FROM nc")
# select rows with specific column value
data <- read_sf(file, query = "SELECT * FROM nc WERE FIPSNO = 37009")
> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,
> : Query execution failed, cannot open layer. In addition: Warning
> message: In CPL_read_ogr(dsn, layer, query, as.character(options),
> quiet, : GDAL Error 1: In ExecuteSQL(): sqlite3_prepare_v2(SELECT *
> FROM nc WERE FIPSNO = 37009): near "FIPSNO": syntax error
Now, using a data set in saved in geopackage format, I cannot even read the entire data set to memory. In this case, it does not seem to recognize the table.
gpkg_file <- system.file("gpkg/nc.gpkg", package="sf")
data <- read_sf(gpkg_file, query = "SELECT * FROM nc")
> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,
> : Query execution failed, cannot open layer. In addition: Warning
> message: In CPL_read_ogr(dsn, layer, query, as.character(options),
> quiet, : GDAL Error 1: In ExecuteSQL(): sqlite3_prepare_v2(SELECT *
> FROM nc): no such table: nc
Your approach feels reasonable; it is somewhat difficult to troubleshoot it without having access to your actual dataset, but we can do with the world geopackage that ships with
{spData}
package.The my first step would be to check the names of layers stored in your geopackage; this will tell you what are legitimate tables to which you can point your WHERE clause.
The second step is running your actual read operation; while
sf::st_read()
andsf::read_sf()
are in principle the same thing - they differ only in defaults for some of the optional parameters - I suggest to usesf::st_read()
when in doubt.In instances of troubleshooting it is also a good idea to assign your parameters explicitly, and not rely on position (dsn should be the first argument by position, but having it assigned by name is bulletproof).
While I believe that reasoneable people can differ in their opinion whether Russian Federation or French Guiana actually belong to Europe the SQL query on technical level clearly works.