Add stars to p-value

2.5k Views Asked by At

I have calculated the Anova F-Test p-value for differences in means for several variables. Now I would like to add "stars" that indicate the significance level of the p-value. I would like to have * for significance at at the 10% level, ** at the 5% level and *** at the 1% level.

My data looks like this:

structure(list(Variables = c("A", "B", "C", "D", "E"), 
               `Anova F-Test p-Value` = c(0.05, 5e-04, 0.5, 0.05, 0.01)), 
          class = "data.frame", row.names = c(NA, -5L))

Could someone help me with the code here?

4

There are 4 best solutions below

8
On BEST ANSWER

You can build your own function. Note however that this is not the conventional star system (it's totally okay if you mention the scale somewhere though). See e.g. here.

stars.pval <- function(x){
  stars <- c("***", "**", "*", "n.s.")
  var <- c(0, 0.01, 0.05, 0.10, 1)
  i <- findInterval(x, var, left.open = T, rightmost.closed = T)
  stars[i]
}

transform(dat, stars = stars.pval(dat$`Anova F-Test p-Value`))

  Variables Anova.F.Test.p.Value stars
1         A                5e-02    **
2         B                5e-04   ***
3         C                5e-01  n.s.
4         D                5e-02    **
5         E                1e-02   ***
0
On

I would suggest to use cut for this

Edit: notes. Use right = FALSE to define p <= alpha as significant, use right = TRUE for p < alpha to be significant. Also changed 0 and 1 for -Inf and Inf, this often handles boundaries better in cut.

dt$stars <- cut(dt[[2]], breaks = c(-Inf, 0.01, 0.05, 0.10, Inf), 
                labels = c("***", "**", "*", "n.s."), right = FALSE)

dt

#   Variables Anova F-Test p-Value stars
# 1         A               0.0500     *
# 2         B               0.0005   ***
# 3         C               0.5000  n.s.
# 4         D               0.0500     *
# 5         E               0.0100    **
0
On

There is an R builtin for this:

df$stars <- symnum(df$`Anova F-Test p-Value`, 
                     symbols   = c("***","**","*",".","n.s."),
                     cutpoints = c(0,  .001,.01,.05, .1, 1),
                     corr      = FALSE
                   )
df
  Variables Anova F-Test p-Value stars
1         A                5e-02     *
2         B                5e-04   ***
3         C                5e-01  n.s.
4         D                5e-02     *
5         E                1e-02    **
0
On

gtools library has a stars.pval() function that takes a numeric vector of p-values and returns stars using R's standard definition