I am converting a large raster file (dimensions: 7991 x 9122) to a polygon grid using as.polygons in the terra package in R. It seems to be taking an abnormally long time, given how much time it takes to convert a small subset of this same raster (dimensions 1000 x 1000). I was hoping someone might be able to see a) if I am doing something wrong coding-wise OR b) maybe I am miscalculating how long this should take. However, I have been going over this for a while and seem to be stuck. Please help.
I have a raster (spGrid) that I need to convert (all non-NA values) to polygons. This is a high-resolution raster (2m x 2m grid), so I anticipated that it would take a while to convert.
library(terra)
spGrid <- rast(paste0(output_dir, "/Bayfield_raster_2m_GLSLAlbers.tif"))
> spGrid
class : SpatRaster
dimensions : 7991, 9122, 1 (nrow, ncol, nlyr)
resolution : 1.999897, 1.999897 (x, y)
extent : 395746.7, 413989.7, 1170791, 1186772 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Great Lakes and St Lawrence Albers (EPSG:3175)
source(s) : memory
varname : Bayfield_raster_2m_GLSLAlbers
name : Depth_sd
min value : 1
max value : 1
#non-NA cells
freq(spGrid)
layer value count
1 1 1 21190243
plot(spGrid)
I first took a subsection of this same raster, but changed the extent to get an idea of how long this would take to run.
e_crop <- ext(396000, 398000, 1173000, 1175000)
test_rast <- crop(spGrid, e_crop)
test_rast
class : SpatRaster
dimensions : 1000, 1000, 1 (nrow, ncol, nlyr)
resolution : 1.999897, 1.999897 (x, y)
extent : 396000.7, 398000.6, 1173001, 1175001 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Great Lakes and St Lawrence Albers (EPSG:3175)
source(s) : memory
varname : Bayfield_raster_2m_GLSLAlbers
name : Depth_sd
min value : 1
max value : 1
freq(test_rast)
layer value count
1 1 1 885661
Then, I ran the as.polygons function with system.time to get an estimated amount of time.
system.time({test_grid <- as.polygons(test_rast, aggregate = FALSE, values = FALSE, na.rm = TRUE)})
user system elapsed
27.48 0.34 27.90
Here, I get ~30 seconds to get a grid of polygons for 1,000,000 cells
Scaling the operation up to 72,893,902 cells, should take 72.9 times longer. So, that is 2,187 seconds or 36.45 minutes.
However, the following code has been running now for almost 6 hours.
poly_spGrid <- as.polygons(spGrid, aggregate = FALSE, values = FALSE, na.rm = TRUE)
There seems to be something wrong, but I can't figure it out. I'm hoping someone here can help. Thanks in advance.
Just an FYI - the non-NA values in the raster are just 1s. The value doesn't matter. This is just a tool to get a grid of polygons further analysis.
> sessionInfo()
> R version 4.3.1 (2023-06-16 ucrt)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
\[1\] LC_COLLATE=English_United States.utf8
\[2\] LC_CTYPE=English_United States.utf8
\[3\] LC_MONETARY=English_United States.utf8
\[4\] LC_NUMERIC=C
\[5\] LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
\[1\] stats graphics grDevices utils datasets
\[6\] methods base
other attached packages:
\[1\] devtools_2.4.5 usethis_2.2.2 raster_3.6-26
\[4\] sp_2.1-1 sf_1.0-14 terra_1.7-55
\[7\] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0
\[10\] dplyr_1.1.3 purrr_1.0.2 readr_2.1.4
\[13\] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.4
\[16\] tidyverse_2.0.0