identify nature of missingness for categorical variables

163 Views Asked by At

could you please give me some hints for identifying the nature of missingness for categorical variables' missing value? I mean, I gave a fast search on google scholar but I didn't find anything related with this. How could I understand if missing-values are missing completely at random, are they missing at random or finally, they are missing not at random? Except studying the domain I can't think anything. Links to some papers are appreciated, Thanks in advance. (I'll add it in sas environment but the question is not specifically related with this language).

1

There are 1 best solutions below

2
On

Since you've tagged this as SAS, one approach you could take would be to create a boolean variable for each of your categorical variables indicating whether or not it has a missing value in each row. Then you could do whatever analysis you like on the frequency of missing values, using the flags. E.g. you could use proc corr to see if missing values of one variable correlate with values of other variables.

E.g. suppose you have a situation like this:

data example;
    set sashelp.class;
    if AGE > 14 then call missing(SEX);
    SEX_MISSING_FLAG = missing(SEX);
run;

Then you could spot it by running the following:

proc corr data = example outp= corr;
    var age weight height sex_missing_flag;
run;

Output:

_TYPE_,_NAME_,Age,Weight,Height,SEX_MISSING_FLAG
MEAN,,13.32,100.03,62.34,0.26
STD,,1.49,22.77,5.13,0.45
N,,19.00,19.00,19.00,19.00
CORR,Age,1.00,0.74,0.81,0.78
CORR,Weight,0.74,1.00,0.88,0.64
CORR,Height,0.81,0.88,1.00,0.55
CORR,SEX_MISSING_FLAG,0.78,0.64,0.55,1.00