group_by multiple variables and calculating percentage within each subgroup in R

35 Views Asked by At

below is my data

data <-tribble(
    ~fruit, ~transaction, ~season, ~n,

    "a",      "sell",      "s",    14,
    "a",      "sell",      "f",    23,
    "a",      "buy",       "s",    43,
    "a",      "buy",       "f",    15,
    "a",      "keep",      "s",    24,
    "a",      "keep",      "f",    5,

    "o",      "sell",      "s",    10,
    "o",      "sell",      "f",    11,
    "o",      "buy",       "s",    6,
    "o",      "buy",       "f",    9,
    "o",      "keep",      "s",    40,
    "o",      "keep",      "f",    10,
    
    "b",      "sell",      "s",    7,
    "b",      "sell",      "f",    0,
    "b",      "buy",       "s",    10,
    "b",      "buy",       "f",    12,
    "b",      "keep",      "s",    30,
    "b",      "keep",      "f",    2
) %>% as.data.frame()

enter image description here

I want to see what percentage of sells in season s were from fruit a, which is 14/(14+10+7)
or what percentage of keepss in season f were from fruit b, 2/(2+10+5)
and replace these frequency numbers with the calculated percentages. I am trying janitor package but the calculations are not what I am looking for. any ideas how to modify it

df <-data %>%
    group_by(season) %>%
    adorn_percentages("col") %>%
    mutate(across(is.numeric, function(x) round(100*x,2)))

enter image description here

1

There are 1 best solutions below

1
DaveArmstrong On

You could divide the n by the sum of all the n values, while grouped by season and transaction. Then, to get it in the shape you want, just pivot_wider.

library(tidyverse)

data %>%
  mutate(n = n / sum(n), .by = c(season, transaction)) %>%
  pivot_wider(names_from= "transaction", values_from = "n")

Output:

# A tibble: 6 × 5
  fruit season  sell   buy  keep
  <chr> <chr>  <dbl> <dbl> <dbl>
1 a     s      0.452 0.729 0.255
2 a     f      0.676 0.417 0.294
3 o     s      0.323 0.102 0.426
4 o     f      0.324 0.25  0.588
5 b     s      0.226 0.169 0.319
6 b     f      0     0.333 0.118