Scatter plot that shows all points with the same value

5.2k Views Asked by At

How could I create a scatter plot in R so that all points are shown there even though I have same values in some categories. Besides data points, I would like to have average values in each category.

For example, if I have data set of two variables where one of them (Cotton Weight Percentage) is factor:

dat <- structure(list(`Tensile Strength` = c(12L, 19L, 17L, 7L, 25L, 
7L, 14L, 12L, 18L, 22L, 18L, 7L, 18L, 18L, 15L, 10L, 11L, 19L, 
11L, 19L, 15L, 19L, 11L, 23L, 9L), `Cotton weight percent` = c(20L, 
30L, 20L, 35L, 30L, 15L, 25L, 20L, 25L, 30L, 20L, 15L, 25L, 20L, 
15L, 35L, 35L, 25L, 15L, 25L, 35L, 30L, 35L, 30L, 15L)), .Names = c("Tensile Strength", 
"Cotton weight percent"), class = "data.frame", row.names = c(NA, 
-25L))

How can I make a scatter plot like this one:enter image description here

Here, solid dots are the individual observations and the open circles are the average observed tensile strengths.

3

There are 3 best solutions below

4
On

This can be done in ggplot2 with geom_jitter and stat_summary. Specifically, the geom_jitter would give you the black points on your graph:

library(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
    geom_jitter(position = position_jitter(width = .1))
p

(The "jitter" is to add some noise in terms of the x-axis, as occurs in your example).

Then the stat_summary layer lets you add a point for the average of each x value (which I've made large and red):

ggplot(mtcars, aes(factor(cyl), mpg)) +
    geom_jitter(position = position_jitter(width = .1)) +
    stat_summary(fun.y = "mean", geom = "point", color = "red", size = 3)

enter image description here

0
On

The beeswarm package offers a nice alternative to jittered points, instead offering a variety of other methods for arranging your points, to make it look - among other things - like in the following plots:

(see here for the beeswarm function used to create these plots)

enter image description here

1
On

Using native R:

plot(dat[,1]~dat[,2],ylab="Tensile Strength",xlab="Cotton weight percent",cex=1.5)
points(sort(unique(dat[,2])),tapply(dat[,1],dat[,2],mean),pch=16,col=3,cex=1.5)

enter image description here

If you want to show repeated cases you can do this:

cwp=sort(unique(dat[,2]))
ta=tapply(1:nrow(dat),list(dat[,2],dat[,1]),length)
ft=function(v,x){#
  nm=as.numeric(colnames(v))
  do.call(rbind,lapply(1:length(nm),function(zv)if(v[zv]>1)
    cbind(rep(x,v[zv])+seq(.6,1.4,length.out=v[zv])-1,nm[zv]) else c(x,nm[zv])))
}
fd=lapply(1:nrow(ta),function(z)ft(t(ta[z,!is.na(ta[z,])]),cwp[z]))
datf=do.call(rbind,fd)

plot(datf[,2]~datf[,1],ylab="Tensile Strength",xlab="Cotton weight percent",cex=1.5)
points(sort(unique(dat[,2])),tapply(dat[,1],dat[,2],mean),pch=16,col=3,cex=1.5)

enter image description here