Improving my Heatmap in R with pheatmap()

611 Views Asked by At

Hey I am a complete beginner to R and need to reproduce this heatmap as best as possible in the next couple of days.

enter image description here

My data is an xlsx file with multiple sheets, I cleaned it as good as I could and it should be in an okay format. I Have 2 Timepoints T0 and T1 as characters so I can not run the pheatmap() function. How can I divide the heatmap into T0 and T1 and structure the Sample IDs in correct order ? Do you have an Idea how to create a heatmap as similar as possible to the given example?

This is the heatmap I was able to produce in the last two days without the T0 and T1 data because they were characters and not numeric values. Now I would like to include them and sort the Samlple ID properly as in the given heatmap.

enter image description here

all_data2 <- cbind(amino,sphingo,hexoses,phospha,lyso,all_data)
matrix_data <- as.matrix(all_data2[, 3:73])
rownames(matrix_data) <- all_data2$`Sample Identification`

heatmap_final <- matrix_data[,!colnames(matrix_data) %in% c('Sample Identification.1','Sample Identification.2','Sample Identification')] 


pheatmap(
 mat               = log2(heatmap_final),
 scale             = "column",
 show_rownames     = TRUE,
 drop_levels       = TRUE,
 fontsize          = 5,
 clustering_method = "complete",
 main              = "Hierachical Cluster Analysis"
)



This is the code I have now for including the T0 and T1 groups but I cant run it because the T values are characters. How would you change the code to basically reproduce the given heatmap as good as possible and improve on mine?

all_data2 <- cbind(amino,sphingo,hexoses,phospha,lysophospha,acyl)
matrix_data <- as.matrix(all_data2[, 3:74])
rownames(matrix_data) <- all_data2$`Sample Identification`

heatmap_final <- matrix_data[,!colnames(matrix_data) %in% c('Sample Identification.1','Sample Identification.2','Sample Identification','Time point.1','Time point.2')]

pheatmap(
  mat = log2(heatmap_final),
  scale = "column",
  show_rownames = TRUE,
  drop_levels = TRUE,
  fontsize = 5,
  clustering_method = "complete",
  main = "Hierachical Cluster Analysis"
)

And furthermore any ideas on how I could imputate the NA´s easily, by logspline imputation so the resulting data is not changed.

1

There are 1 best solutions below

0
On

Re. “the T values are characters.”: Are the T characters meant to be TRUE logical values? If so, and assuming you are reading in the data from, say, a CSV file, then you need to either 1) specify the column type when you read it in, or 2) change the column type in the data.frame using as.logical()

Df$mycolumn <- as.logical(Df$mycolumn)