ordered bubble chart in R

65 Views Asked by At

I am attempting to draw a bubble plot in R. I have a data frame df with several columns but there are 4 main important ones:

freq   Pass         col        mut
  20     P1     #bfbfff   A_T_S_12
  30     P2     #bfbfff   A_T_S_12
  40     P3     #bfbfff   A_T_S_12
  10     P1     #7879ff  C_G_NS_60
  11     P2     #7879ff  C_G_NS_60
  12     P3     #7879ff  C_G_NS_60
  50     P1     #4949ff  T_C_S_101
  30     P2     #4949ff  T_C_S_101
  20     P3     #4949ff  T_C_S_101
...
  • freq: size of the bubble
  • mut: type of mutation
  • col: colour of the specific passage
  • Pass: passage

After running the code below:

cirPack_cutoff <- df[df$freq > 20 & df$freq < 99 ,]
   
packing <- circleProgressiveLayout(cirPack_cutoff$freq)

data1 <- cbind(cirPack_cutoff, packing)

data1$col <- ifelse(data1$Pass == "P10", "#bfbfff",
                    ifelse(data1$Pass == "P5", "#7879ff", "#4949ff"))
    
dat.gg <- circleLayoutVertices(packing, npoints=50)
    
    
ggplot() +  
  geom_polygon(data = dat.gg, 
               aes(x, y, group = id, fill=as.factor(id)), 
               colour = "black", lwd = 0.3,  show.legend = FALSE, alpha = 1) + 
  scale_fill_manual(values = df$col)+ 
  theme_classic() +
  coord_equal()

Which outputs this image:

However, the circles in each passage are distributed randomly, I would like to distribute them in a specific order, which should follow the mut information. The three circle of P1, P2, and P3 in each type of mutation should be arranged on the same line.

Any advice on how to approach this?

Desired output

enter image description here

1

There are 1 best solutions below

5
Allan Cameron On

It's difficult to know the expected output for sure from the given description, but it sounds as though you want one cluster for each mutation. This would require running the packing algorithm for each mutation. The plotting code would be easier and more idiomatic if you mapped the Pass to the fill colour.

Here's a full reprex:

library(packcircles)
library(ggplot2)

cirPack_cutoff <- df[df$freq > 20 & df$freq < 99 ,]

dat.gg <- do.call("rbind", split(cirPack_cutoff, cirPack_cutoff$mut) |>
  lapply(function(d) {
    dat <- circleProgressiveLayout(d$freq) |>
           circleLayoutVertices(npoints = 50)
    cbind(dat, d[dat$id,])
  }))

ggplot() +  
  geom_polygon(data = dat.gg, 
               aes(x, y, group = interaction(mut, id), fill = Pass), 
               colour = "black", lwd = 0.3, alpha = 1, show.legend = FALSE) +
  scale_fill_manual(values = c("#bfbfff", "#7879ff", "#4949ff")) +
  facet_grid(~mut) +
  coord_equal() +
  theme_void(base_size = 20) 

enter image description here


Data used

There was not enough sample data in the question to generate this image, so the data came from randomly sampling your example:

set.seed(1)

df <- read.table(text =  "freq   Pass         col        mut
                            20     P1     #bfbfff   A_T_S_12
                            30     P2     #bfbfff   A_T_S_12
                            40     P3     #bfbfff   A_T_S_12
                            10     P1     #7879ff  C_G_NS_60
                            11     P2     #7879ff  C_G_NS_60
                            12     P3     #7879ff  C_G_NS_60
                            50     P1     #4949ff  T_C_S_101
                            30     P2     #4949ff  T_C_S_101
                            20     P3     #4949ff  T_C_S_101", 
                 comment.char = "\\", header = TRUE) |>
  lapply(sample, size = 500, replace = TRUE) |>
  as.data.frame()

df <- df[order(df$Pass),]