I want to calculate partial correlations between sets of two variables while controlling for all the other variables in a data frame.
To do this, I used the pcor(c("variable1", "variable2", "control1", "control2", etc.), var(dataFrame)) from the ggm package. However, it didn't work, meaning I got NA for the partial correlation coefficient.
My data frame has scores of personality test results assessing the participants for neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness:
studentLecturerPersonality <- read.delim("http://www.discoveringstatistics.com/docs/Chamorro-Premuzic.dat", header = TRUE)
names(studentLecturerPersonality) <- c("age", "gender", "studentNeuroticism", "studentExtraversion", "studentOpenness", "studentAgreeableness", "studentConscientiousness","lecturerNeuroticism", "lecturerExtraversion", "lecturerOpenness", "lecturerAgreeableness", "lecturerConscientiousness")
studentLecturerPersonalityOnlyTraits <- subset(studentLecturerPersonality, select = c("studentNeuroticism", "studentExtraversion", "studentOpenness", "studentAgreeableness", "studentConscientiousness"))
I calculated the correlation between the variables using both cor(dataFrame, use = "pairwise.complete.obs", method = "pearson") and cor(variable1, variable2, use = "pairwise.complete.obs", method = "pearson"), in which I know how to deal with missing values (NAs).
I wanted to calculate partial correlation coefficients between the variables extraversion and neuroticism while controlling for openness to experience, agreeableness, and conscientiousnes:
studentLecturerPersonalityOnlyTraitsMatrix <- as.matrix(studentLecturerPersonalityOnlyTraits)
pcExtraversionNeuroticism <- pcor(c("studentExtraversion", "studentNeuroticism",
"studentOpenness",
"studentAgreeableness",
"studentConscientiousness"), var(studentLecturerPersonalityOnlyTraitsMatrix))
pcExtraversionNeuroticism
which returns [1] NA.
I don't know if it's because the data frame contains missing values (NAs), which I didn't (or couldn't) specify how R should deal with (like in cor()).
Can anyone suggest how I can make the pcor() work or an alternative method?
I really appreciate any help you can provide.
First, use
complete.cases()to subset the matrix to just the rows which do not containNA:Then use this matrix before to take the partial correlation:
It is worth noting that this will drop any rows which contain
NA, rather than just rows of the columns you are using. In this case you are using all the columns so that isn't a problem. However, in the event you were only using, for example, the first two columns, you might wish to do:As an aside, your variable names are very long. The Style Guide in Advanced R by Hadley Wickham says:
You have certainly got meaningful names. This is a matter of taste, but I wonder if they could be a little more concise!