How can I get the median of all measurements within a certain distance?

88 Views Asked by At

There's probably a better way, but since I'm new to R and already had the IDW code set up, I've been trying to get the median of all points within 2000 meters by tweaking the IDW code, setting the weighting power (idp) near zero so closer points are weighted the same as far ones.

I'm guessing it says NA when I run the code below with maxdist=2000 because some points don't have any neighbors within 2000 meters. The smallest maxdist I can get it to work with is ~40,000, even if I set nmin to zero.

Is there a way to tell it to ignore points without neighbors within 2000 meters, or does someone know a better way to do this?

Here's my code:

library(gstat)
clean3145 = read.csv("clean3145.csv")

#Set up the k-fold validation
set.seed(88)
groups <- sample(1:5, nrow(clean3145), replace=TRUE)

#res=result=R=Pearson's correlation between predicted and actual arsenic concentration
MEDres<- rep(NA, 5)

r <- list()
for (k in 1:5) {
  print(k)
  flush.console()
  train <- clean3145[groups!=k, ]
  test <- clean3145[groups==k, ]

  med <- gstat(formula = As1~1, locations = ~UTMNM+UTMEM, data=train, nmin=0, maxdist=40000, set=list(idp = .01))
  medpred <- predict(med, test)$var1.pred
  MEDres[k] <- cor(test$As1, medpred)

  }

#Show the mean correlation for the 5 different training-test dataset pairs in K-fold validation
mean(MEDres)

Thanks for your help!

2

There are 2 best solutions below

0
On

I can't see how your code helps answering your original question, but for a local median I would try

library(sp)
demo(meuse, ask = FALSE)
library(gstat)
x = krige(zinc~1, meuse, meuse.grid, maxdist = 1000, set = list(method = "med"))

If a neighbourhood contains no data, you may define it by the number of nearest points, nmax, in which case, of course, distance is no longer controlled.

2
On

Thanks Edzer!

I'll save that for future reference. We got it to work this way, with depth criteria too (I'm trying to estimate arsenic in groundwater):

#Load required packages and data
library(raster)
depth = read.csv("depth.csv")

Set up the k-fold validation, making sure the same random sample is chosen each time for comparability

set.seed(88)
groups <- sample(1:5, nrow(depth), replace=TRUE)

Compute median arsenic concentration of all training (trn) wells within a certain point distance (pd) of test (tst) wells, using the UTM east and north coordinates in meters (UTMEM, UTMNM). Ignore or "remove" test wells that do not have neighbors within 148 meters (pd>148=NA, na.rm=TRUE)

  computeMed <- function(trn, tst) {
  pd <- pointDistance(trn[ , c('UTMEM', 'UTMNM')], tst[ , c('UTMEM','UTMNM')], lonlat=FALSE)  
  pd[pd > 148] <- NA

  as <- trn$As1
  as <- matrix(rep(as, ncol(pd)), ncol=ncol(pd))
  aspd <- as * (pd >= 0)
  apply(aspd, 2, median, na.rm=TRUE)

    }

Compute medians again, this time with depth criteria (e.g., if test well is near Fallon (Tcan2car=1=wells from Truckee Canal to Carson Basin and downgradient)and more than 40 m deep, only give median of neighbors that are also >40 m deep)

r <- rd <- list()
Fallon <- FALSE
for (k in 1:5) {
  print(k)
  flush.console()
  depth$deep <- TRUE
  depth$deep[depth$Depth_m < 40] <- FALSE
  if (Fallon) {
    d  <- depth[depth$Tcan2car==1]
   } else {
     d <- depth
  }
  train <- d[groups!=k, ]
  test <- d[groups==k, ]

   p <- computeMed(train,test)
   r[[k]] <- cbind(k=k, prd=p, obs=test$As1)


   pdeep <- computeMed(train[train$deep,],test[test$deep,])
   pshallow <- computeMed(train[!train$deep,],test[!test$deep,])


  rd[[k]] <- cbind(k=k, prd=c(pdeep, pshallow),    obs=test$As1[c(which(test$deep), which(!test$deep))])

 }

Show the mean Pearson's R correlation for the 5 different training-test dataset pairs in K-fold validation. cr and r refer to correlations based on distance only. crd and rd refer also include depth criteria

cr <- sapply(r, function(x) {x <- na.omit(x); cor(x[,2:3])[2]})
cr
mean(cr)

crd <- sapply(rd, function(x) {x <- na.omit(x); cor(x[,2:3])[2]})
crd
mean(crd)