2D irregular aggregation of a matrix

208 Views Asked by At

I'm trying to bin a symmetric matrix with irregular intervals in R but am not sure how to proceed. My ideas are:

  • Reshape the matrix to long format, aggregate and cast it back?
  • Bin as-is in both dimensions (somehow... tapply, aggregate?)
  • Keep the regular binning but for each of my (larger) irregular bins, replace all inner values with their sum?

Here's an example of what I'm trying to do:

set.seed(42)

# symmetric matrix
a <- matrix(rpois(1e4, 2), 100)
a[upper.tri(a)] <- t(a)[upper.tri(a)]

image(x=1:100, y=1:100, a, asp=1, frame=F, axes=F)

# vector of irregular breaks for binning
breaks <- c(12, 14, 25, 60, 71, 89)

# white line show the desired bins
abline(h=breaks-.5, lwd=2, col="white")
abline(v=breaks-.5, lwd=2, col="white")

symmMat

(The aim being that each rectangle drawn above be filled according to the sum of values within it.) I'd appreciate any pointers of how best to approach this.

1

There are 1 best solutions below

0
On BEST ANSWER

This answer provides a great starting point using tapply:

b <- melt(a)

bb <- with(b, tapply(value, 
    list(
      y=cut(Var1, breaks=c(0, breaks, Inf), include.lowest=T), 
      x=cut(Var2, breaks=c(0, breaks, Inf), include.lowest=T)
    ),
    sum)
)

bb
#          x
# y          [0,12] (12,14] (14,25] (25,60] (60,71] (71,89] (89,Inf]
#  [0,12]      297      48     260     825     242     416      246
#  (12,14]      48       3      43     141      46      59       33
#  (14,25]     260      43     261     794     250     369      240
#  (25,60]     825     141     794    2545     730    1303      778
#  (60,71]     242      46     250     730     193     394      225
#  (71,89]     416      59     369    1303     394     597      369
#  (89,Inf]    246      33     240     778     225     369      230

These can then be plotted as rectangular bins using a base plot and rect — i.e.:

library("reshape2")
library("magrittr")

bsq <- melt(bb)

# convert range notation to numerics
getNum <- . %>%
  # rm brackets
  gsub("\\[|\\(|\\]|\\)", "", .) %>%
  # split digits and convert
  strsplit(",") %>%
  unlist %>% as.numeric

y <- t(sapply(bsq[,1], getNum))
x <- t(sapply(bsq[,2], getNum))

# normalise bin intensity by area
bsq$size <- (y[,2] - y[,1]) * (x[,2] - x[,1])
bsq$norm <- bsq$value / bsq$size

# draw rectangles on top of empty plot
plot(1:100, 1:100, type="n", frame=F, axes=F)
rect(ybottom=y[,1], ytop=y[,2],
     xleft=x[,1], xright=x[,2], 
     col=rgb(colorRamp(c("white", "steelblue4"))(bsq$norm / max(bsq$norm)), 
             alpha=255*(bsq$norm / max(bsq$norm)), max=255),
     border="white")

enter image description here