I want to create a heatmap using data frame which contains heterogeneous data (table contains all data types such as numeric values, logical, character , NA and empty cells). Here is an example dataset that matches the actual dataset I have. I want to plot "citizen" on y axis and plot all other variables (column) on x-axis.


structure(list(ID = c("ID123", "ID456", "ID523", "ID875", "ID782", 
"ID572", "ID900"), Citizen = c("US", "CN", "MX", "US", "US", 
"CA", "CA"), Ht = c("6", "NA", "5", "6", "5", NA, "6"), Wt = c("200", 
"140", "160", NA, "NA", "175", NA), Age = c("NA", "45", NA, "32", 
"60", "44", "30"), income = c("60", "50", "30", "20", "40", "NA", 
"20"), sex = c("M", "F", "NA", NA, "M", "M", "F"), `Traffic vio` = c(TRUE, 
FALSE, TRUE, FALSE, NA, TRUE, TRUE), Greets = c("Hello", "Bonjour", 
"Hola", "Hi", "Hello", "Hello", "Bonjour")), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))
1

There are 1 best solutions below

3
On

The first thing you need to do is to convert your character strings containing "NA" with the NA constant.

library(dplyr)
df <- df %>% na_if("NA")

Next you need your numeric data to not be stored as character.

df <- df %>%
  mutate(across(Ht:income, as.numeric))

You might want your character columns to be factors, especially Citizen, sex and Greets.

df <- df %>%
  mutate(across(where(is.character), factor)

You may want to decide what to do with your NA in Traffic vio - is this more likely a TRUE or FALSE? Leave it be if you want.

df <- df %>%
  mutate(`Traffic vio` = if_else(is.na(`Traffic vio`), FALSE, `Traffic vio`))

You can now make a heatmap using geom_tile from ggplot2. If you want to plot summary statistics, like mean, you should probably aggregate your data ahead of time.

df %>%
  group_by(Citizen, sex) %>%
  summarize(Age = mean(Age, na.rm = TRUE)) %>%
  ggplot() + 
  geom_tile(aes(x = sex, y = Citizen, fill = Age))

enter image description here