I have the following dataset on R:
| patent_id | CPC |
|---|---|
| xxxxxx | Y02P |
| xxxxxx | H01M |
| xxxxxx | GO1H |
| xxxxxx | AO2A |
| yyyyyy | A01B |
| yyyyyy | Y02E |
| yyyyyy | Y02T |
| yyyyyy | Y04S |
For each CPC equal to CPC-Y02 (Y02A, Y02B, Y02C, Y02D, Y02E, Y02P, Y02T, Y02W, Y04S) for each patent_id, I need to count how many times the other CPC occurs, to find to which CPC different from the list, each Y02 are associated the most.
I have tried with duplicating the column of CPC to count across the two columns but I get the number of occurrences fro each CPC with itself:
x <- y %>% group_by(CPC_4digit, CPC_dup) %>% summarise(n=n()) %>% spread(CPC_dup, n, fill = 0L)
Thank you for your time and help!
Is this what you want?
Result: