I am trying to make a clustere emphasized text*d heatmap with pheatmap() in R for cytokine values. I also want to add in annotations based on a categorical variable with 4 different options. Despite ensuring there is no missing data the annotations have multiple rows with no colored annotation. As seen in the picture enter image description here
My original data set had 380 samples, but I cleaned the data to only include rows that have no missing values, which brought me down to 154 samples with 41 variables. Here are the names of the variables-
[1] "tbm_category" "tnfa" "il6" "il33" "il3" "il8" "il7" "ip10" "il10"
[10] "egf" "vegf" "grob" "il1b" "ifng" "il1ra" "mip3a" "il12" "mip1a"
[19] "il31" "mip1b" "il1a" "il4" "mip3b" "il2" "groa" "fractalkine" "fgfbasic"
[28] "eotaxin" "il15" "il5" "gcsf" "pdgfaa" "mcp1" "ifna" "il21" "trail"
[37] "tnfsf5" "il23" "flt3ligand" "il18" "granzymeb"`
The "tbm_category" is a categorical variable with 4 different groups and these are the options and their counts:
| Definite TBM | Not TBM |
|---|---|
| 26 | 57 |
| Possible TBM | Probable TBM |
|---|---|
| 52 | 19 |
The data is called "cleaned_cyto_ordered" and I ordered it based on "tbm_category".
I also grouped the cytokines together for ease of coding:
numeric_variables <- names(cleaned_cyto_ordered)[names(cleaned_cyto_ordered) != "tbm_category"]
Here is my check to ensure no missing values
if (any_missing) {print("There are missing values in the dataset.")} else {print("There are no missing values in the dataset.")}
[1] "There are no missing values in the dataset."
My goal is to cluster the cytokine data and then add the colored annotations based on tbm_category. The code I used to create the heat map I attached (I log10 transformed the cytokine values as they tend to be extremely small and to get any meaningful analysis you have to lag10 transform):
pheatmap(log10(cleaned_cyto_ordered[, numeric_variables]),
annotation_row = data.frame(Category= cleaned_cyto_ordered$tbm_category),
scale = "row",
cluster_rows = FALSE,
show_rownames = FALSE, clustering_distance_cols = "correlation")