I'm trying to create a missingness diagram of my data. I've ran the following code:
library(visdat)
library(naniar)
vis_miss(data, sort_miss = TRUE, show_perc = TRUE)
However, the labels are employment.factor
or a variation instead of Employment
. How do I change this label?
Also, all of my variables in the dataset are included here. How do I select which certain variables are included in the missingness diagram?
Rather than changing the variable names after plotted, can you change the variable names to a new subset from the actual data set, THEN plot? Using the dplyr package:
sort_miss=TRUE arranges the variables by most missingness on the x-axis which you have included vis_miss returns a ggplot object so it is possible to change labels apparently. This github project seems to have provided an example using vis_miss and R's airquality data set: https://github.com/ropensci/visdat/blob/master/R/vis-miss.R
You can get the order of columns with highest missingness:
Then get the names of those columns:
Gather the variables together for plotting (column of the row number, then variable, then contents of that variable)
Have you tried pulling up the help documentation for ?naniar which lists all of its available functions included in the package? There is some explanation about using naniar here: https://cran.r-project.org/web/packages/naniar/vignettes/naniar-visualisation.html