I'm trying to determine if there are significant differences in frequencies of a categorial variable with 8 levels between two groups. In this case, two groups are asked their favorite colors with 8 choices. I want to know if there are significant differences in the frequency of people in Group 1 picking a color versus people in Group 2 picking the same color.
I.e., 64.2% of Grp 1 picked Orange compared to 53% in Group 2,. Is this difference significant? Here is a frequency table using tabpct()
tabpct(all_data$Colors, all_data$Group, graph = F)
Column percent
all_data$Group
all_data$Colors Grp 1 % Grp 2 %
Red 3 (1.3) 2 (1.0)
Blue 19 (8.4) 10 (5.0)
Yellow 1 (0.4) 2 (1.0)
Green 4 (1.8) 5 (2.5)
Purple 1 (0.4) 2 (1.0)
Orange 145 (64.2) 106 (53.0)
Pink 1 (0.4) 1 (0.5)
Brown 52 (23.0) 72 (36.0)
Total 226 (100) 200 (100)
I'm sure there is a simpler way, but I can't seem to figure it out. Any help would be appreciated!
I've tried to model an Anova and do a TukeyHSD test on it, but I'm given the error despite the fact that there are no NA, NaN, Inf, or 0:
ColorComp <- aov(Color ~ Group, data = all_data)
TukeyHSD(ColorComp)
> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
> NA/NaN/Inf in 'y'
> In addition: Warning message:
> In storage.mode(v) <- "double" : NAs introduced by coercion
I have also tried regression with the same error.
Testing individual color differences is not statistically valid unless there is some a priori reason that makes just that color the focus of the analysis.
The Fisher test using a Monte-Carlo simulation indicates borderline suggestive evidence of a difference in distribution:
A chi-square test can be done but is of doubtful validity.