I want to write a code for applying the fuction calculating the Spearman's rank correlation between combinations of column from a dataset. I have the following dataset:
library(openxlsx)
data <-read.xlsx("e:/LINGUISTICS/mydata.xlsx", 1);
A B C D
go see get eat
see get eat go
get go go get
eat eat see see
The function cor(rank(x), rank(y), method = "spearman") measures correlation only between two columns, e.g. between A and B:
cor(rank(data$A), rank(data$B), method = "spearman")
But I need to calculate correlation between all possible combinations of columns (AB, AC, AD, BC, BD, CD). I wrote the following function for that:
wert <- function(x, y) { cor(rank(x), rank(y), method = "spearman") }
I do not know how to implement all possible combinations of columns (AB, AC, AD, BC, BD, CD) in my function in order to get all results automatically, because my real data has much more columns, and also as a matrix with correlation scores, e.g. as the following table:
A B C D
A 1 0.3 0.4 0.8
B 0.3 1 0.6 0.5
C 0.4 0.6 1 0.1
D 0.8 0.5 0.1 1
Can somebody help me?
You do not need
rank
.cor
already calculates the Spearman rank correlation withmethod = "spearman"
. If you want the correlation between all columns of a data.frame, just pass the data.frame tocor
, i.e.cor(data, method = "spearman")
. You should studyhelp("cor")
.If you want to do this manually, use the
combn
function.PS: Your additional challenge is that you actually have factor variables. A rank for an unordered factor is a strange concept, but R just uses collation order here. Since
cor
rightly expects numeric input, you should dodata[] <- lapply(data, as.integer)
first.