Count Categorical Variables Summarized by Month and Year

31 Views Asked by At

I am attempting to summarize a count of categorical variables by month and year. Below shows what I obtain versus what I desire.

library(dplyr)
library(tidyverse)
library(lubridate)

data <- tibble(Date = as.Date(c("2021-10-01", "2021-10-01", "2021-10-01", "2022-10-01")),
               b = c("honey", "honey", "free", "love"))
               
data

# A tibble: 4 × 2
#  Date       b    
#  <date>     <chr>
#1 2021-10-01 honey
#2 2021-10-01 honey
#3 2021-10-01 free 
#4 2022-10-01 love 

The code:

data %>% group_by(year=year(Date), month=month(Date)) %>% count()

Produces this:

# A tibble: 2 × 3
# Groups:   year, month [2]
#   year month     n
#   <dbl> <dbl> <int>
# 1  2021    10     3
# 2  2022    10     1

But this is what I desire as output ...

#    year month   honey   free    love
#   <dbl> <dbl>   <int>  <int>   <int>
# 1  2021    10       2      1       0
# 2  2022    10       0      0       1

1

There are 1 best solutions below

0
jpsmith On BEST ANSWER

You can use tidyr::pivot_wider() with the values_fn (to count how many occurrences using length) and values_fill (to replace NA values with 0):

library(dplyr)
library(tidyr)

data %>% 
  pivot_wider(names_from = b, 
              values_from = b, 
              values_fn = length,
              values_fill = 0)

Output:

 Date       honey  free  love
  <date>     <int> <int> <int>
1 2021-10-01     2     1     0
2 2022-10-01     0     0     1

If you want to add month in there, just use a simple mutate after pivoting:

data %>% 
  pivot_wider(names_from = b, 
              values_from = b, 
              values_fn = length,
              values_fill = 0) %>%
  mutate(month = lubridate::month(Date))

Output:

 Date       honey  free  love month
  <date>     <int> <int> <int> <dbl>
1 2021-10-01     2     1     0    10
2 2022-10-01     0     0     1    10