current r codes take longtime to extract climate data from worldlcim, how to make this faster

232 Views Asked by At

Currently my code is taking too long to extract climate variables from the [worldclim][1] dataset. I would like to download the climate data from the link and find the maximum temperature over my species distribution polygons and save as a CSV file in a directory.

The code is working, but taking too long (say, 3-4 days on my PC). Can anybody suggest how to improve the performance of my code?

My code is here:

# download the climate dataset and unzip. I can download and unzip this into my pc. Please suggest me on main codes for improvement
download.file("http://biogeo.ucdavis.edu/data/climate/cmip5/30s/mi85tx50.zip", destfile = "E://ClimateDataOutputs//MIROC-ESM-CHEM_rcp85TX", mode="wb")
unzip("E://ClimateDataOutputs//MIROC-ESM-CHEM_rcp85TX")
# Codes for improvement
# load required packages
require(sp)
require(rgdal)
require(raster)
require(lsr)
require(maptools)
# For Bioclim - Need to project species polygons
projection <- CRS ("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
polygons <- readShapePoly("F:\\9. Other projects\\All projected maps\\AllP.shp", proj4string = projection)
polygons$BINOMIAL <- as.character(polygons$BINOMIAL)
names=c(polygons$BINOMIAL)
stats_out<- data.frame(matrix(NA, ncol = 4, nrow = 579))
colnames(stats_out)<-c("BINOMIAL", "AAD", "mean", "obs")
stats_out[,1]<-names
# iterate over species polygons


for (i in 1:579) {
    poly<-polygons[i,]
    print(poly$BINOMIAL)
    data_out<-data.frame(matrix(NA, ncol = 1))
    colnames(data_out)<-c("MaxTemp2050rcp85_MIROC_ESM_CHEM")` 



    for (j in 1:12) {
        filename<-c(paste("E:\\ClimateDataOutputs\\mi85tx50",j,".tif", sep=""))
        ##print(filename)
        grid<-raster(filename)
        ##plot(grid)
        ##plot(poly, add=TRUE)
        data<-extract(grid, poly)
        data1<-as.data.frame(data)
        colnames(data1)<-c("MaxTemp2050rcp85_MIROC_ESM_CHEM")
        data_out= rbind(data_out,data1)
        }



    M<-mean(data_out$MaxTemp2050rcp85_MIROC_ESM_CHEM, na.rm=TRUE)
    AAD<-aad(data_out$MaxTemp2050rcp85_MIROC_ESM_CHEM, na.rm=TRUE)
    stats_out$AAD[i]<-AAD
    stats_out$mean[i]<-M
    stats_out$obs[i]<-nrow(data_out)
  }


print(stats_out)
write.csv(stats_out, "E://ClimateDataOutputs//MaxTemp2050rcp85_MIROC_ESM_CHEM_AAD.csv")
1

There are 1 best solutions below

0
On BEST ANSWER

Why don't you stack your rasters?

stack(list.files("E:\\ClimateDataOutputs","mi85tx50",full.names=T))
data<-extract(grid, poly)

It may help

Other option, the line:

data_out= rbind(data_out,data1)

is very inefficient. It's always better in a loop to prepare the object first and fill it up like data_out[j,] <- data1

Finally, it's a little harder, but find a way to make your "j" loop into a function and parallelize your analyses with parLapply on all of your polygons.

Also, it's always better with those kind of question to add the system.time statement so we know more where the bottleneck is.