as.polygons in R package terra seems to be taking abnormally long time for large SpatRaster

103 Views Asked by At

I am converting a large raster file (dimensions: 7991 x 9122) to a polygon grid using as.polygons in the terra package in R. It seems to be taking an abnormally long time, given how much time it takes to convert a small subset of this same raster (dimensions 1000 x 1000). I was hoping someone might be able to see a) if I am doing something wrong coding-wise OR b) maybe I am miscalculating how long this should take. However, I have been going over this for a while and seem to be stuck. Please help.

I have a raster (spGrid) that I need to convert (all non-NA values) to polygons. This is a high-resolution raster (2m x 2m grid), so I anticipated that it would take a while to convert.

library(terra)

spGrid <- rast(paste0(output_dir, "/Bayfield_raster_2m_GLSLAlbers.tif"))

> spGrid
class       : SpatRaster 
dimensions  : 7991, 9122, 1  (nrow, ncol, nlyr)
resolution  : 1.999897, 1.999897  (x, y)
extent      : 395746.7, 413989.7, 1170791, 1186772  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Great Lakes and St Lawrence Albers (EPSG:3175) 
source(s)   : memory
varname     : Bayfield_raster_2m_GLSLAlbers 
name        : Depth_sd 
min value   :        1 
max value   :        1 

#non-NA cells
freq(spGrid)

  layer value    count
1     1     1 21190243

plot(spGrid)

spGrid

I first took a subsection of this same raster, but changed the extent to get an idea of how long this would take to run.

e_crop <- ext(396000, 398000, 1173000, 1175000)
test_rast <- crop(spGrid, e_crop)

test_rast

class       : SpatRaster 
dimensions  : 1000, 1000, 1  (nrow, ncol, nlyr)
resolution  : 1.999897, 1.999897  (x, y)
extent      : 396000.7, 398000.6, 1173001, 1175001  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Great Lakes and St Lawrence Albers (EPSG:3175) 
source(s)   : memory
varname     : Bayfield_raster_2m_GLSLAlbers 
name        : Depth_sd 
min value   :        1 
max value   :        1 

freq(test_rast)

  layer value  count
1     1     1 885661

Then, I ran the as.polygons function with system.time to get an estimated amount of time.

system.time({test_grid <- as.polygons(test_rast, aggregate = FALSE, values = FALSE, na.rm = TRUE)})

   user  system elapsed 
  27.48    0.34   27.90 

Here, I get ~30 seconds to get a grid of polygons for 1,000,000 cells

Scaling the operation up to 72,893,902 cells, should take 72.9 times longer. So, that is 2,187 seconds or 36.45 minutes.

However, the following code has been running now for almost 6 hours.

poly_spGrid <- as.polygons(spGrid, aggregate = FALSE, values = FALSE, na.rm = TRUE)

There seems to be something wrong, but I can't figure it out. I'm hoping someone here can help. Thanks in advance.

Just an FYI - the non-NA values in the raster are just 1s. The value doesn't matter. This is just a tool to get a grid of polygons further analysis.

> sessionInfo()
> R version 4.3.1 (2023-06-16 ucrt)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
\[1\] LC_COLLATE=English_United States.utf8
\[2\] LC_CTYPE=English_United States.utf8  
\[3\] LC_MONETARY=English_United States.utf8
\[4\] LC_NUMERIC=C  
\[5\] LC_TIME=English_United States.utf8

time zone: America/New_York
tzcode source: internal

attached base packages:
\[1\] stats     graphics  grDevices utils     datasets
\[6\] methods   base

other attached packages:
\[1\] devtools_2.4.5  usethis_2.2.2   raster_3.6-26  
\[4\] sp_2.1-1        sf_1.0-14       terra_1.7-55  
\[7\] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0  
\[10\] dplyr_1.1.3     purrr_1.0.2     readr_2.1.4  
\[13\] tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.4  
\[16\] tidyverse_2.0.0
0

There are 0 best solutions below