I want to create a heatmap using data frame which contains heterogeneous data (table contains all data types such as numeric values, logical, character , NA and empty cells). Here is an example dataset that matches the actual dataset I have. I want to plot "citizen" on y axis and plot all other variables (column) on x-axis.
structure(list(ID = c("ID123", "ID456", "ID523", "ID875", "ID782",
"ID572", "ID900"), Citizen = c("US", "CN", "MX", "US", "US",
"CA", "CA"), Ht = c("6", "NA", "5", "6", "5", NA, "6"), Wt = c("200",
"140", "160", NA, "NA", "175", NA), Age = c("NA", "45", NA, "32",
"60", "44", "30"), income = c("60", "50", "30", "20", "40", "NA",
"20"), sex = c("M", "F", "NA", NA, "M", "M", "F"), `Traffic vio` = c(TRUE,
FALSE, TRUE, FALSE, NA, TRUE, TRUE), Greets = c("Hello", "Bonjour",
"Hola", "Hi", "Hello", "Hello", "Bonjour")), row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
The first thing you need to do is to convert your character strings containing
"NA"
with theNA
constant.Next you need your numeric data to not be stored as character.
You might want your character columns to be factors, especially
Citizen
,sex
andGreets
.You may want to decide what to do with your
NA
inTraffic vio
- is this more likely a TRUE or FALSE? Leave it be if you want.You can now make a heatmap using
geom_tile
fromggplot2
. If you want to plot summary statistics, like mean, you should probably aggregate your data ahead of time.