Beeswarm plot data input

1.2k Views Asked by At

I'm trying to create a beeswarm plot in R with my specific dataset. I am not an R expert. My data looks like this:

group    p1    p2    p3    p4
A       .01    .1    n/a   1.9
A       2.0    n/a   n/a   .05
A       n/a    n/a   n/a   .3
B       .05    .1    1.0   .5
B       1.0    .02   .054  .01
B       .05    n/a   3.1   .8

What I would like to see is the beeswarm plot with columns that are p1, p2, p3 and p4, and for each column it displays the different groups, colored by the distinct groups (red for 'A' and blue for 'B' for example). On the y axis I would like to see the actual data points/measurements.

I can separate out the data by groups as well if that makes it easier- so there would be one table for 'A' and one table for 'B' that I could overlap on the same plot.

I just don't know how to make the columns corrrespond to p1, p2, etc. and overlay the different measurements in a column given my input data.

1

There are 1 best solutions below

0
On BEST ANSWER

I am not quite sure how your data is, as you did not provide sample data that I could use. With "n/a" as missing value indicator you probably will still have some trouble..

Anyway, here is a way how it can be achieved

Let's produce a sample data set, similar to yours:

 set.seed(3)
x <- data.frame(p1 = rnorm(5,10,4), p2 = rnorm(5, 40, 10),
            p3=rnorm(5,1,3), p4=rnorm(5,6,4),
            group=sample(c("A", "B"), 5, replace = TRUE))

Notice that the gouping variable is in column five. Now we can produce a beeswarm plot easily (as it is automatically grouped by column) by:

library("beeswarm")
beeswarm(x[,-5])

Column 5 is left out because it contains our grouping and no data.

Now for the colors. The easiest I could think of was using the pwcol argument of the beeswarm function. For that we first have to create a color vector. There probably is a better way to do all this, but this works.

Create color vector out of column 5 that contains a 2 when group is "A" and 3 when it is not A. 2 and 3 are picked arbitrarily; these are the values of the color (red and green here). Any values that col likes could be chosen.

colors=ifelse(x$group=="A", 2,3) 

Since the vector is only 5 long it could only be used in coloring the first column of the beeswarm, so we have to enlarge it (we need one color value per datapoint).

colors=rep(colors, ncol(x[,-5]))
beeswarm(x[,-5], pwcol=colors)

Good luck with your data!