how draw sequence logo plot based on data table in R?

76 Views Asked by At

I have a data table where I would like to plot the sequence logo based on my Input data.

Input:

data <- data.frame(
Cns = c("H", "H", "H", "Q", "D", "D", "I", "S", "M", "P"),
variable = c("H", "Q", "R", "Q", "D", "N", "I", "S", "M", "P"),
rate = c(99.1, 0.236, 0.708, 100, 99.3, 0.708, 100, 100, 100, 100)
)

How can I draw a logo plot based on the Input (not alignment files), while having "Cns" on the x-axis, "rate" on the y-axis and a "variable" column as the logo and their size change based on the rate column?

2

There are 2 best solutions below

0
On BEST ANSWER

This is a bit of a faff. There was a package for creating sequence logos in R, but it was removed from CRAN last month. You can install and load the latest working version by doing:

# install('devtools') # Uncomment this line if you don't have devtools installed
devtools::install_version('ggseqlogo', '0.1')
library(ggseqlogo)

You then need to get your data into matrix format, which requires a bit of manipulation:

data <- data.frame(
  Cns = c("H", "H", "H", "Q", "D", "D", "I", "S", "M", "P"),
  variable = c("H", "Q", "R", "Q", "D", "N", "I", "S", "M", "P"),
  rate = c(99.1, 0.236, 0.708, 100, 99.3, 0.708, 100, 100, 100, 100)
)

df <- expand.grid(Cns = unique(data$Cns), variable = unique(data$variable))

df$rate <- unlist(Map(function(x, y) {
  i <- which(data$Cns == x & data$variable == y)
  if(length(i) == 0) return(0) else sum(data$rate[i])
}, df$Cns, df$variable))
  
mat <- matrix(df$rate, nrow = length(unique(data$variable)), byrow = TRUE,
              dimnames = list(unique(data$variable), unique(data$Cns)))

If you want a colorful result that plots the letter heights according to rate, you can then do:

p <- ggseqlogo(mat, method = 'custom', seq_type = 'other') 
p$layers[[1]]$mapping <- aes(x, y, fill = letter, group = group_by)
p + scale_fill_discrete() +
  scale_x_continuous(breaks = seq_along(unique(data$Cns)),
                     labels = unique(data$Cns))

enter image description here

3
On

You could use ggseqlogo package. Reshape the data into a matrix, and pass to ggseqlogo() function, as below

data = reshape2::dcast(data, variable~Cns,fill=0, value.var="rate")
data_mat = as.matrix(data[,-1])
rownames(data_mat) <- data$variable
ggseqlogo::ggseqlogo(data_mat, method='custom', seq_type='dna') + 
  ggplot2::scale_x_continuous("Cns", labels=colnames(data_mat),breaks=c(1:7)) + 
  ggplot2::labs(y="Rate")

example_use_of_ggseqlogo

Note: I changed your input data so that rate values all fall between 0 and 1, inclusive. The code still runs on your original data, but difficult to see visually.