We start with a dataframe of the following form:
my_df <- structure(
list(
a = c(1, 1, 1, 2, 2, 2, 3, 3),
b = c('M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M3'),
c = c(0, 0, 0, 1, 1, 0, 1, 1)
),
.Names = c("ID", "METHOD", "RESULT"),
row.names = c(NA, 8L),
class = "data.frame"
)
In this simplified example, we have three methods (M1, M2, M3), three indivdiuals (1,2,3, and for 3, there are only results for M1 and M3 available), and two possible test results, 0 (negative) and 1 (positive). I would like to get an output that looks as follows:
M1 positive | M1 negative | M2 positive | M2 negative | M3 positive | M3 negative | |
---|---|---|---|---|---|---|
if M1 positive | 100% (XX/XX) | 0% (XX/XX) | % (XX/XX) | % (XX/XX) | % (XX/XX) | % (XX/XX) |
if M1 negative | 0% (XX/XX) | 100% (XX/XX) | % (XX/XX) | % (XX/XX) | % (XX/XX) | % (XX/XX) |
if M2 positive | % (XX/XX) | % (XX/XX) | 100% (XX/XX) | 0% (XX/XX) | % (XX/XX) | % (XX/XX) |
if M2 negative | % (XX/XX) | % (XX/XX) | 0% (XX/XX) | 100% (XX/XX) | % (XX/XX) | % (XX/XX) |
if M3 positive | % (XX/XX) | % (XX/XX) | % (XX/XX) | % (XX/XX) | 100% (XX/XX) | 0% (XX/XX) |
if M3 negative | % (XX/XX) | % (XX/XX) | % (XX/XX) | % (XX/XX) | 0% (XX/XX) | 100% (XX/XX) |
Or in words, I would like to know how much the result of a method, say M1 for example, coincides with the result of another method. As an output, I would love to get (1) the percentages (say, if M1 positive, then 22% of M3 is also positive) and (2) the absolute numbers (e.g., in 1000 that are positive for M1, 220 were positive for M3). Thus, the (XX/XX) stands for, e.g., number of positives for M1/number of positives for M3, in this particular example.
I have taken several approaches, trying to harness ifelse, if_else, and case_when but I am looking for a generic way, preferentially somewhere situated in the tidyverse, of doing it that will provide the array -- a cross-correlation matrix in a way -- I have in mind. Any help on how to aggregate the data would be appreciated.
Output: