How to get a correlation coefficient for a comparison between continuous and categorical variable

147 Views Asked by At

I have this dataset. I am trying to find the correlation between a continuous variable expr with a categorical variable WHO_Grade:

> dput(tmp)

structure(list(expr = c(3.72491159808923, 7.8316405937301, 4.1302793124001, 
6.81536170645658, 6.68352582647051, 6.0974581720256, 6.81642917136002, 
6.52282686468863, 6.95033151442703, 7.40122305409127, 6.734502473652, 
4.52338197246748, 5.66198159225926, 6.35210096732929, 5.98394091367302, 
6.17792680351041, 6.99774731062209, 6.47837700390364, 8.46842852300251, 
8.8053866571277, 7.69349747186817, 9.92409345097255, 8.32535569092761, 
11.0752169414371, 6.46020070978671, 6.49791316573007, 4.67879084729252, 
6.27362589525792, 5.57597697034067, 4.81081903029741, 6.49576031725988, 
5.03389765403437, 5.07427129999886), WHO_Grade = c("4", "3", 
"3", "3", "3", "3", "3", "3", "2", "2", "2", "4", "4", "3", "4", 
"3", "3", "4", "1", "1", "1", "1", "1", "1", "1", "4", "4", "4", 
"4", "4", "4", "4", "4")), class = "data.frame", row.names = c(NA, 
-33L))

> kruskal.test(expr ~ WHO_Grade, data = tmp)

    Kruskal-Wallis rank sum test

data:  expr by WHO_Grade
Kruskal-Wallis chi-squared = 19.659, df = 3, p-value = 0.0001998

And here is the boxplot for the same data. As evident from the boxplot, there is a negative correlation between the expression and WHO_Grade (1-4 are in increasing severity of disease). Is there a way I can obtain a single value (something like a correlation coefficient) which can tell me that the relationship is negatively or positively correlated without having to look at the plot?

file:///var/folders/hs/dhj65y752djbrytfxzz6t13h35p24r/T/TemporaryItems/NSIRD_screencaptureui_JEGg1C/Screen%20Shot%202022-03-24%20at%203.32.22%20PM.png

1

There are 1 best solutions below

0
On

Kruskal-Wallis test evaluates if there is a significant variation between any categories in the sample (overall value). I would do pairwise Wilcoxon test, to evaluate the difference in the continious variable between each group.

results<- tmp %>% 
  select_if(is.numeric) %>%
  purrr::map(~ pairwise.wilcox.test(.x , tmp$WHO_Grade, p.adjust.method = "fdr"))

While you technically can do pairwise Kruskal-Wallis, it is designed to include at least 3 categories. Wilcoxon's test is also a non parametric test to compare two groups.