I have a data frame with columns: "category", "count", and "phase". Sample data looks like:
| category | count | phase |
|---|---|---|
| International Politics | 4221 | phase0 |
| Economy | 6182 | phase1 |
| Domestic affairs | 1151 | phase0 |
| Occupation | 1892 | phase0 |
| Combat | 1122 | phase0 |
| Domestic affairs | 4221 | phase2 |
| International Politics | 611 | phase2 |
| International Politics | 918 | phase3 |
| Economy | 4282 | phase3 |
| International Politics | 6212 | phase5 |
| Occupation | 5142 | phase4 |
Each phase (phase0, phase1, phase2, and so on) can have a different number of unique corresponding categories. For example, phase0 unique categories are: International Politics, Domestic affairs, Combat, and phase3 unique categories are: International Politics, economy. Count is the count of a category for a given phase.
My objective is to check whether the categories are statistically significant across the phases or not. I am trying to use the n-way ANOVA test for that. I followed the instructions provided by this site. However, I am still confused about how to implement it. How can I implement this?
Thanks in advance!!