stars package: how to define additional dimensions based on an attribute (filename)?

1k Views Asked by At

I have a set of raster files (in this case downloaded from http://www.paleoclim.org/) that I am reading into R using the stars package.

library("tidyverse")
library("fs")
library("stars")

data_path <- "./paleoclim"
(data_files <- list.files(data_path, pattern = "*.tif"))
#>   [1] "BA_v1_2_5m_bio_1_badia.tif"             
#>   [2] "BA_v1_2_5m_bio_10_badia.tif"            
#>   [3] "BA_v1_2_5m_bio_11_badia.tif"   
#> [...]         
#>  [39] "EH_v1_2_5m_bio_1_badia.tif"             
#>  [40] "EH_v1_2_5m_bio_10_badia.tif"            
#>  [41] "EH_v1_2_5m_bio_11_badia.tif"       
#> [...]         
#>  [58] "HS1_v1_2_5m_bio_1_badia.tif"            
#>  [59] "HS1_v1_2_5m_bio_10_badia.tif"           
#>  [60] "HS1_v1_2_5m_bio_11_badia.tif"           
#> [...]         

(paleoclim <- read_stars(path(data_path, data_files)))
#> stars object with 2 dimensions and 133 attributes
#> attribute(s):
#>  BA_v1_2_5m_bio_1_badia.tif  BA_v1_2_5m_bio_10_badia.tif 
#>  Min.   :101.0               Min.   :213.0               
#>  1st Qu.:166.0               1st Qu.:278.0               
#>  Median :173.0               Median :298.0               
#>  Mean   :171.8               Mean   :290.3               
#>  3rd Qu.:180.0               3rd Qu.:304.0               
#>  Max.   :200.0               Max.   :325.0               
#> [...]         
#> dimension(s):
#>   from to offset      delta refsys point values    
#> x    1 72     36  0.0416667 WGS 84 FALSE   NULL [x]
#> y    1 48     33 -0.0416667 WGS 84 FALSE   NULL [y]

Created on 2020-12-07 by the reprex package (v0.3.0)

The filenames contain two pieces of information that I would like to represent as dimensions of the stars object, e.g. HS1_v1_2_5m_bio_1_badia.tif refers to period "HS1" and bioclimatic variable "bio_1".

I've got as far as using st_redimension() to create the new dimensions and levels:

periods <- str_extract(names(paleoclim), "[^_]+")
biovars <- str_extract(names(paleoclim), "bio_[0-9]+")

paleoclim %>% 
  merge() %>% 
  st_redimension(
    new_dims = st_dimensions(x = 1:72, y = 1:48, 
                             period = unique(periods),
                             biovar = unique(biovars))
  )
#> stars object with 4 dimensions and 1 attribute
#> attribute(s):
#>        X          
#>  Min.   :  -91.0  
#>  1st Qu.:   26.0  
#>  Median :   78.0  
#>  Mean   :  588.2  
#>  3rd Qu.:  256.0  
#>  Max.   :11275.0  
#> dimension(s):
#>        from to offset delta refsys point          values    
#> x         1 72      1     1     NA FALSE            NULL [x]
#> y         1 48      1     1     NA FALSE            NULL [y]
#> period    1  7     NA    NA     NA FALSE      BA,...,YDS    
#> biovar    1 19     NA    NA     NA FALSE bio_1,...,bio_9

But this doesn't actually map the values of the attributes (filenames) to the levels of the new dimensions. Also, most of the information (e.g. CRS) about the original x and y dimensions are lost because I have to recreate them manually.

How do you properly define new dimensions of a stars object based on another dimension or attribute?

1

There are 1 best solutions below

0
On

Don't see a straightforward way to split one dimension into two after all files have been read into a three-dimensional stars object. An alternative approach you could use is:

  1. read one folder at a time, where all files of that folder go into the variable third dimension, stored as separate stars objects in a list,
  2. then combine the resulting stars objects, where the stars objects go into the period fourth dimension.

For this example, I downloaded the following two products and unzipped into two separate folders:

Here is the code:

library(stars)

# Directories with GeoTIFF files
paths = c(
    "/home/michael/Downloads/BA_v1_10m", 
    "/home/michael/Downloads/HS1_v1_10m"
)

# Read the files and set 3rd dimension
r = list()
for(i in paths) {
    files = list.files(path = i, pattern = "\\.tif$", full.names = TRUE)
    r[[i]] = read_stars(files)
    names(r[[i]]) = basename(files)
    r[[i]] = st_redimension(r[[i]])
}

# Combine the list
r = do.call(c, r)

# Attributes to 4th dimension
names(r) = basename(paths)
r = st_redimension(r)

# Clean dimension names
r = st_set_dimensions(r, names = c("x", "y", "variable", "period"))
r

and the printout of the result:

## stars object with 4 dimensions and 1 attribute
## attribute(s), summary of first 1e+05 cells:
##  BA_v1_10m.HS1_v1_10m 
##  Min.   :-344.0       
##  1st Qu.:-290.0       
##  Median :-274.0       
##  Mean   :-264.8       
##  3rd Qu.:-252.0       
##  Max.   :-128.0       
##  NA's   :94073        
## dimension(s):
##          from   to  offset     delta refsys point                  values x/y
## x           1 2160    -180  0.166667 WGS 84 FALSE                    NULL [x]
## y           1 1072 88.6667 -0.166667 WGS 84 FALSE                    NULL [y]
## variable    1   19      NA        NA     NA    NA bio_1.tif,...,bio_9.tif    
## period      1    2      NA        NA     NA    NA  BA_v1_10m , HS1_v1_10m

The result is a stars object with four dimensions, including x, y, variable, and period.

Here are plots, separately for each of the two levels in the period dimension:

plot(r[,,,,1,drop=TRUE])

plot1

plot(r[,,,,2,drop=TRUE])

plot2