How do I sum instances of a categorical value without using table in R?

142 Views Asked by At

I want to add up the number of times a value appears based on another value. Example data:

df <- data.frame(hour = c("1", "2", "1", "2", "3", "2", "3"), name = c("A", "B", "A", "B", "C", "A", "B"))

Using table (table(df$hour, df$name) gives me exactly the right output but I don't want a table - I want to do a heat map in ggplot and need a dataframe. I have been pulling my hair out - there has to be an easy way.

2

There are 2 best solutions below

0
On

The table output can be converted to data frame. Use one of these depending on the output desired:

as.data.frame.matrix(table(df))

library(tibble)
rownames_to_column(as.data.frame.matrix(table(df)), "hour")

as.data.frame(table(df))

heatmap

Regarding heatmaps note that heatmap in the base of R accepts table output directly (and also gplots::balloonplot not shown here accepts table output):

heatmap(table(df))

It can also be done in ggpubr::balloonplot, lattice::levelplot or ggplot2 using as.data.frame(table(df)):

library(ggpubr)
ggballoonplot(as.data.frame(table(df)))

library(lattice)
levelplot(Freq ~ hour * name, as.data.frame(table(df)))

library(dplyr)
library(ggplot2)
df %>% 
  table %>% 
  as.data.frame %>% 
  ggplot(aes(hour, name, fill = Freq)) + geom_tile()

The output looks like this (see Note at end for code that generated this):

enter image description here

Note

df <- structure(list(hour = c("1", "2", "1", "2", "3", "2", "3"), name = c("A", 
"B", "A", "B", "C", "A", "B")), class = "data.frame", row.names = c(NA, 
-7L))

library(cowplot)
library(gridGraphics)

heatmap(table(df), main = "heatmap")
# convert from classic to grid graphics to later combine
grid.echo()
p1 <- grid.grab()

library(ggpubr)
p2 <- ggballoonplot(as.data.frame(table(df))) + 
  ggtitle("ggubr::ggballoonplot")

library(lattice)
p3 <- levelplot(Freq ~ hour * name, as.data.frame(table(df)), 
  main = "lattice::levelplot")

library(magrittr)
library(ggplot2)

p4 <- df %>% 
  table %>% 
  as.data.frame %>% 
  ggplot(aes(hour, name, fill = Freq)) + geom_tile() + ggtitle("ggplot2")

plot_grid(p2, p3, p4, p1, nrow = 2)
0
On

I want to do a heat map in ggplot and need a dataframe.

An option might be stat_bin_2d()

library(ggplot2)
ggplot(df, aes(hour, name)) +
  stat_bin_2d()

Result

enter image description here

From ?stat_bin_2d:

Divides the plane into rectangles, counts the number of cases in each rectangle, and then (by default) maps the number of cases to the rectangle's fill.