Reverse overplotting alpha value in ggplot2

796 Views Asked by At

The alpha value in ggplot2 is often used to help with overplotting in R. Darker colors represent areas where many observations fall, and lighter colors represent areas where only a few observations fall. Is it possible to reverse this? So, that outliers (with typically few observations) are emphasized as darker, whereas the majority of data (with typically many observtaions) are emphasized as lighter?

Below is a MWE:

myDat <- data.frame(x=rnorm(10000,0,1),y=rnorm(10000,0,1))
qplot(x=x, y=y, data=myDat, alpha=0.2)

The more rare observations away from the center (0,0) are lighter. How can I reverse that? Thank you for any ideas.

2

There are 2 best solutions below

0
On BEST ANSWER

You could try setting the alpha value for each point separately, with opacity increasing further from the center. Something like this

p = 2 # adjust this parameter to set how steeply opacity ncreases with distance
d = (myDat$x^2 + myDat$y^2)^p
al = d / max(d)
ggplot(myDat, aes(x=x, y=y))  + geom_point(alpha = al)

enter image description here

0
On

Try this with Mahalanobis distance from the centroid as outlier scores (the ones with higher scores can be assigned darker colors, instead of using alpha values):

myDat <- data.frame(x=rnorm(10000,0,1),y=rnorm(10000,0,1))
mu <- colMeans(myDat)

# assuming x, y independent, if not we can always calculate a non-zero cov(x,y)
sigma <- matrix(c(var(myDat$x), 0, 0, var(myDat$y)), nrow=2) 
# use (squared) *Mahalanobis distance* as outlier score
myDat$outlier.score <- apply(myDat, 1, function(x) t(x-mu)%*%solve(sigma)%*%(x-mu))
qplot(x=x, y=y, data=myDat, col=outlier.score) + 
     scale_color_gradient(low='white', high='blue')

enter image description here

# assuming x, y are not independent
sigma <- matrix(c(var(myDat$x), cov(myDat$x, myDat$y), cov(myDat$x, myDat$y), var(myDat$y)), nrow=2) 
# use (squared) *Mahalanobis distance* from centroid as outlier score
myDat$outlier.score <- apply(myDat, 1, function(x) t(x-mu)%*%solve(sigma)%*%(x-mu)) 
qplot(x=x, y=y, data=myDat, col=outlier.score) + 
    scale_color_gradient(low='white', high='blue')

enter image description here