I try to understand how to parallelize raster processing in R. My Goal ist to parallize the following on multiple cores with multiple rasters. I process my raster blockwise and i try to parallelize it with mclapply or other functions. First i want to get the values of one raster or a rasterstack. and then i want to write the values to the object. When i am using multiple cores, it does not work, because different sub Processes want to write on the same time. Somebody know a solution for that?
So here is the process:
get and create data
r <- raster(system.file("external/test.grd", package="raster"))
s <- raster(r)
tr <- blockSize(r)
then getValues and writevalues with a for loop
s <- writeStart(s[[1]], filename='test.grd', overwrite=TRUE)
for (i in 1:tr$n) {
v <- getValuesBlock(r, row=tr$row[i], nrows=tr$nrows[i])
s <- writeValues(s, v, tr$row[i])
}
s <- writeStop(s)
this works fine
now trying the same on lapply
s <- writeStart(s[[1]], filename='test.grd', overwrite=TRUE)
#working with lapply
lapply(1:tr$n, function(x){
v <- getValues(r, tr$row[x], tr$nrows[x])
s <- writeValues(s,v,tr$row[x])
})
s <- writeStop(s)
works fine
Now trying with mclapply with one core
s <- writeStart(s[[1]], filename='test.grd', overwrite=TRUE)
#does work with mclapply one core
parallel::mclapply(1:tr$n, function(x){
v <- getValues(r, tr$row[x], tr$nrows[x])
s <- writeValues(s,v,tr$row[x])
}, mc.cores = 1)
s <- writeStop(s)
also works
now trying with mclapply on multiple cores
s <- writeStart(s[[1]], filename='test.grd', overwrite=TRUE)
#does not work with multiple core
parallel::mclapply(1:tr$n, function(x){
v <- getValues(r, tr$row[x], tr$nrows[x])
s <- writeValues(s,v,tr$row[x])
}, mc.cores = 2)
s <- writeStop(s)
So that does not work. I understand the logic, why it does not work. My question now is: Suppose I have a rasterstack with 2 rasters. Could I use mclapply or another function from the parallel package to write this process differently. So I get the values of the block for both grids at the same time, but these values are only written to one rater per core.
For the solution I am looking for it is not acceptable to first get all values, safe them in an object and then write the values blockwise, because my rasters are to large.
I would be very happy if someone has a solution or just an idea or suggestion. Thanks.
I believe the object returned by
raster::writeStart()
can only be processed in the same R process as it was created. That is, it is not possible for a parallel R process to work with it.The fact that the object uses an external pointer internally is a strong indicator that it cannot be exported to another R process or saved to file or read back again. You can check for external pointers using (non-public)
future:::assert_no_references()
, e.g.