How to count occurrences across rows within the same column by multiple conditions

74 Views Asked by At

I have the following dataset on R:

patent_id CPC
xxxxxx Y02P
xxxxxx H01M
xxxxxx GO1H
xxxxxx AO2A
yyyyyy A01B
yyyyyy Y02E
yyyyyy Y02T
yyyyyy Y04S

For each CPC equal to CPC-Y02 (Y02A, Y02B, Y02C, Y02D, Y02E, Y02P, Y02T, Y02W, Y04S) for each patent_id, I need to count how many times the other CPC occurs, to find to which CPC different from the list, each Y02 are associated the most.

I have tried with duplicating the column of CPC to count across the two columns but I get the number of occurrences fro each CPC with itself:

x <- y %>% group_by(CPC_4digit, CPC_dup) %>% summarise(n=n()) %>% spread(CPC_dup, n, fill = 0L)

Thank you for your time and help!

1

There are 1 best solutions below

2
asd-tm On

Is this what you want?

library(dplyr)
tibble(patent_id = c("xxxxxx",
                     "xxxxxx",
                     "xxxxxx",
                     "xxxxxx",
                     "yyyyyy",
                     "yyyyyy",
                     "yyyyyy",
                     "yyyyyy"),
       CPC = c("Y02P",
               "H01M",
               "GO1H",
               "AO2A",
               "A01B",
               "Y02E",
               "Y02T",
               "Y04S") ) %>% 
  group_by(patent_id, CPC) %>% 
  mutate(times = n()) %>% 
  group_by(patent_id) %>% 
  mutate(totaltimes = n()) %>% 
  transmute(patent_id, CPC, other_times = totaltimes - times)

Result:

# A tibble: 8 × 3
# Groups:   patent_id [2]
  patent_id CPC   other_times
  <chr>     <chr>       <int>
1 xxxxxx    Y02P            3
2 xxxxxx    H01M            3
3 xxxxxx    GO1H            3
4 xxxxxx    AO2A            3
5 yyyyyy    A01B            3
6 yyyyyy    Y02E            3
7 yyyyyy    Y02T            3
8 yyyyyy    Y04S            3