Summarizing factor counts by other variables in R

38 Views Asked by At

I'm attempting to summarize the number of observations of levels in a factor variable by other variables in the same dataset. We are running a clinical training study where patients and controls describe pictures, and I'm conducting an analysis of the types of errors patients made. I want to see whether the specific training conditions and session types (baseline, training, post-testing, etc) affect what errors are produced. The data look as follows:

| ParticipantID | Group   | SessionType | TrainingCondition | ErrorType |
| p1            | Control | Baseline    | Alternating       | GE        |
| p1            | Control | Baseline    | Alternating       | RR        |
| p1            | Control | Post-Test   | Alternating       | NT        |
...
| p2            | Patient | Baseline    | Single            | GE        |

There are three levels of the SessionType variable (Baseline, Immediate Post, 1 Week Post), two of the TrainingCondition variable (Alternating & Single), and 5 of the ErrorType variable (GE, NS, LE, NT, RR). What I need is a summary of how often each level of ErrorType occurred by Group, SessionType, and TrainingCondition. Ideally, I'd get something like this:

| Group   | SessionType | TrainingCondition | ErrorType | Count |
| Control | Baseline    | Alternating       | GE        | 5     |
| Control | Post-test   | Alternating       | GE        | 10    |
...
| Patient | Baseline    | Single            | NT        | 7     |
&c.

I've tried several possible solutions, but none have resulted in what I want. The closest is this code using the tidyverse:

error.sum <- df %>% 
  group_by(trainingCondition, Group, SessionType, ErrorType) %>%
  summarise(Count = count(df, ErrorType)$n)`

Which resulted in something close, but not there. All counts have been duplicated in the output:

Alternating | Control | Baseline | GE | 596  |
Alternating | Control | Baseline | GE | 46   |
Alternating | Control | Baseline | GE | 79   |
Alternating | Control | Baseline | GE | 187  |
Alternating | Control | Baseline | GE | 500  |
Alternating | Control | Baseline | GE | 1853 |
Alternating | Control | Baseline | GE | 37   |
Alternating | Control | Baseline | NT | 596  |
Alternating | Control | Baseline | NT | 46   |
Alternating | Control | Baseline | NT | 79   |
Alternating | Control | Baseline | NT | 187  |
Alternating | Control | Baseline | NT | 500  |
Alternating | Control | Baseline | NT | 1853 |
Alternating | Control | Baseline | NT | 37   |

I suspect count() counted the overall instances of each error type rather than counts of ErrorType by the other variables? I'm not sure. Any help would be greatly appreciated!

0

There are 0 best solutions below