I'm attempting to summarize the number of observations of levels in a factor variable by other variables in the same dataset. We are running a clinical training study where patients and controls describe pictures, and I'm conducting an analysis of the types of errors patients made. I want to see whether the specific training conditions and session types (baseline, training, post-testing, etc) affect what errors are produced. The data look as follows:
| ParticipantID | Group | SessionType | TrainingCondition | ErrorType |
| p1 | Control | Baseline | Alternating | GE |
| p1 | Control | Baseline | Alternating | RR |
| p1 | Control | Post-Test | Alternating | NT |
...
| p2 | Patient | Baseline | Single | GE |
There are three levels of the SessionType variable (Baseline, Immediate Post, 1 Week Post), two of the TrainingCondition variable (Alternating & Single), and 5 of the ErrorType variable (GE, NS, LE, NT, RR). What I need is a summary of how often each level of ErrorType occurred by Group, SessionType, and TrainingCondition. Ideally, I'd get something like this:
| Group | SessionType | TrainingCondition | ErrorType | Count |
| Control | Baseline | Alternating | GE | 5 |
| Control | Post-test | Alternating | GE | 10 |
...
| Patient | Baseline | Single | NT | 7 |
&c.
I've tried several possible solutions, but none have resulted in what I want. The closest is this code using the tidyverse:
error.sum <- df %>%
group_by(trainingCondition, Group, SessionType, ErrorType) %>%
summarise(Count = count(df, ErrorType)$n)`
Which resulted in something close, but not there. All counts have been duplicated in the output:
Alternating | Control | Baseline | GE | 596 |
Alternating | Control | Baseline | GE | 46 |
Alternating | Control | Baseline | GE | 79 |
Alternating | Control | Baseline | GE | 187 |
Alternating | Control | Baseline | GE | 500 |
Alternating | Control | Baseline | GE | 1853 |
Alternating | Control | Baseline | GE | 37 |
Alternating | Control | Baseline | NT | 596 |
Alternating | Control | Baseline | NT | 46 |
Alternating | Control | Baseline | NT | 79 |
Alternating | Control | Baseline | NT | 187 |
Alternating | Control | Baseline | NT | 500 |
Alternating | Control | Baseline | NT | 1853 |
Alternating | Control | Baseline | NT | 37 |
I suspect count() counted the overall instances of each error type rather than counts of ErrorType by the other variables? I'm not sure. Any help would be greatly appreciated!