I have this dataset. I am trying to find the correlation between a continuous variable expr
with a categorical variable WHO_Grade
:
> dput(tmp)
structure(list(expr = c(3.72491159808923, 7.8316405937301, 4.1302793124001,
6.81536170645658, 6.68352582647051, 6.0974581720256, 6.81642917136002,
6.52282686468863, 6.95033151442703, 7.40122305409127, 6.734502473652,
4.52338197246748, 5.66198159225926, 6.35210096732929, 5.98394091367302,
6.17792680351041, 6.99774731062209, 6.47837700390364, 8.46842852300251,
8.8053866571277, 7.69349747186817, 9.92409345097255, 8.32535569092761,
11.0752169414371, 6.46020070978671, 6.49791316573007, 4.67879084729252,
6.27362589525792, 5.57597697034067, 4.81081903029741, 6.49576031725988,
5.03389765403437, 5.07427129999886), WHO_Grade = c("4", "3",
"3", "3", "3", "3", "3", "3", "2", "2", "2", "4", "4", "3", "4",
"3", "3", "4", "1", "1", "1", "1", "1", "1", "1", "4", "4", "4",
"4", "4", "4", "4", "4")), class = "data.frame", row.names = c(NA,
-33L))
> kruskal.test(expr ~ WHO_Grade, data = tmp)
Kruskal-Wallis rank sum test
data: expr by WHO_Grade
Kruskal-Wallis chi-squared = 19.659, df = 3, p-value = 0.0001998
And here is the boxplot for the same data. As evident from the boxplot, there is a negative correlation between the expression and WHO_Grade (1-4 are in increasing severity of disease). Is there a way I can obtain a single value (something like a correlation coefficient) which can tell me that the relationship is negatively or positively correlated without having to look at the plot?
Kruskal-Wallis test evaluates if there is a significant variation between any categories in the sample (overall value). I would do pairwise Wilcoxon test, to evaluate the difference in the continious variable between each group.
While you technically can do pairwise Kruskal-Wallis, it is designed to include at least 3 categories. Wilcoxon's test is also a non parametric test to compare two groups.