Rowwise comparison in R for several fields

37 Views Asked by At

Suppose you have a dataframe df_matches with transaction data structured in the following way:

| REPORT_ID| VALUE    | SIDE     | COUNTRY  | CP1      | CP2      |...
| -------- | -------- | -------- | -------- | -------- | -------- |...
| ABC123   | 20       | B        | DE       | A        | B        |...
| ABC123   | 20       | S        | FR       | B        | A        |...
| DEF456   | 60       | B        | DE       | A        | C        |...
| DEF456   | 62       | S        | AT       | C        | A        |...
| GHI789   | 75       | B        | NL       | D        | E        |...
| GHI789   | 65       | S        | NL       | E        | D        |...
|...       |...       |...       |...       |...       |...       |...   

I want to calculate similarity measures per REPORT_ID for several attributes and therefore want to change the structure of the dataframe to look like this:

| REPORT_ID| VALUE_1  | VALUE_2  | SIDE_1  | SIDE_2 | CP1_1  | CP1_2   |...
| -------- | -------- | -------- | --------| -------| -------| --------|...
| ABC123   | 20       | 20       | B       | S      | A      | B       |...
| DEF456   | 60       | 62       | S       | B      | A      | C       |...
| GHI798   | 75       | 65       | B       | S      | D      | E       |...
|...       |...       |...       |...      |...     |...     |...      |...

   

Is it the most efficient way to do this using dplyr/group_by REPORT_ID and look up the values for first and second report by using "first" and "last" in the summarise-command like this?

  df_sim_calc <- df_matches %>% 
    dplyr::group_by(REPORT_ID) %>% 
    dplyr::summarise(VALUE_1 = first(VALUE),
                     VALUE_2 = last(VALUE),
                     SIDE_1 = first(SIDE),
                     CP1_1 = first(CP1),
                     CP1_2 = last(CP1),
                     CP2_1 = first(CP2),
                     CP2_2 = last(CP2),
                     )

Thanks!

0

There are 0 best solutions below