R - How to iterate each slice of 4D matrix

Question

R - How to iterate each slice of 4D matrix

215 Views Asked by Jamie S At 30 July 2025 at 06:51

I'm using doMPI in R to parallelize saving netCDF climate data. This data is stored in R in a 4-dimensional matrix m, with data for 6 variables, at 20000 timepoints, over a latitude and longitude grid. m is thus indexed as m[lon,lat,time,variable]. Based on how netCDF stores its data on disk, the most efficient way to write data to disk is by timeslice. Consequently, I'd like to iterate over m one timeslice at a time, for each variable. Currently, my code looks like this:

    ntime <- 20000
    output.vars <- list("rainfall", "snowfallwateq", "snowmelt", "newsnow", "snowdepth", "swe") 
    for (var.index in seq_along(output.vars)) {
        ncout <- nc_open(output.files[var.index], write=TRUE)

        val <- foreach(time.index=1:ntime, .packages=c("ncdf4")) %dopar%
        {
            ncvar_put(ncout, output.vars[[var.index]], 
                      vals=m[,,time.index,var.index],
                      start=c(1, 1, time.index),
                      count=c(nlon, nlat, 1))
        }

        nc_close(ncout)
    }

This unnecessarily copies the entire m matrix to each worker. That takes up a prohibitively large amount of memory, and I need to reduce the amount of copied data. What I thought of from this answer was that I could iterate over each timeslice of the matrix, so only the data for the timeslice was copied to each worker at each iteration. The foreach construct allows multiple objects to iterate simultaneously, so I could even have the time index alongside the matrix timeslice without a problem. Unfortunately, I don't know of any way to iterate over a matrix by-timeslice. Is there a way to do so, such that at each iteration t of the foreach loop for variable var, I can have a variable data which holds the 2-dimensional matrix m[,,t,var]?

I've already tried the intuitive approach below, but it iterates over each individual element, rather than an entire timeslice at a time.

val <- foreach(time.index=1:ntime, slice=m[,,,var], ...

Original Q&A

There are 1 best solutions below

**Alexis** · Answer 1

If you can massage your data in the main R process, you could try transforming each 2-dimensional slice into a big.matrix from the bigmemory package, and use that in your parallel workers. This would only be useful if the time required to process each slice in the slave process is significant.

See this example, and notice that you can nest 2 foreach loops with %:%

m <- as.numeric(1:16)
dim(m) <- rep(2L, 4L)

# use %do% for sequential processing, without copying the data to parallel workers
big_m <- foreach(i=1L:2L, .combine=c) %:% foreach(j=1L:2L, .combine=list) %do% {
  as.big.matrix(m[,,i,j], type="double")
}

descriptors <- lapply(big_m, describe)

# specify .noexport to avoid copying the data to each worker
foreach(m_slice_desc=descriptors, .packages=c("bigmemory"), .noexport=ls(all.names=TRUE)) %dopar% {
  # you could even modify the slices in parallel if you wanted
  m_slice <- attach.big.matrix(m_slice_desc)
  for (i in 1L:2L) {
    for (j in 1L:2L) {
      m_slice[i,j] <- m_slice[i,j] * 2
    }
  }
  # return nothing
  NULL
}

# just to show that the values were modified in place
for (bm in big_m) { print(bm[,]) }
     [,1] [,2]
[1,]    2    6
[2,]    4    8
     [,1] [,2]
[1,]   18   22
[2,]   20   24
     [,1] [,2]
[1,]   10   14
[2,]   12   16
     [,1] [,2]
[1,]   26   30
[2,]   28   32

If you can't/won't use bigmemory, or if processing each 2-dimensional slice is too fast (yes, this may be problematic for multi-processing, see this answer), maybe you can extract 3-dimensional slices from your data and use .noexport to only copy one at a time, something like:

slices_3d <- lapply(1L:2L, function(i) { m[,,,i] })
foreach(slice_3d=slices_3d, .noexport=ls(all.names=TRUE)) %dopar% {
  for (j in 1L:2L) {
    slice_2d <- slice_3d[,,j]
    # do something
  }
  # return nothing
  NULL
}

I'm actually not 100% sure the above would prevent copying the whole slices_3d, if not, you may have to manually extract subsets in chunks in the main R process (e.g. slices_3d[1L:num_parallel_workers] and so on each time), and make sure that only one chunk is exported in each call to foreach.

R - How to iterate each slice of 4D matrix

There are 1 best solutions below

Related Questions in R

Related Questions in NETCDF

Related Questions in PARALLEL-FOREACH

Related Questions in DOMPI

Trending Questions

Popular # Hahtags

Popular Questions