What is the tidyr pivot_wider equivalent to the index argument in Python's pivot_table?

118 Views Asked by At

In this question (What is the best way in python to create a confusion matrix out of an already aggregated table with counts of actual labels and predicted labels), it is relatively straightforward to get a confusion matrix of two columns of data by pivoting it using the following function:

df = df.pivot_table(values='Count Occurences', index='Actual Label', columns='Predicted Label')

How would one go about this in R using tidyr::pivot_wider? I have tried several permutations of the following code on a sparklyr table, but it always results in an error (for example, that the names are not unique).

df %>% select(x, y) %>%
group_by(x, y) %>%
count %>%
pivot_wider(id_cols = 'x', names_from = 'y', values_from = 'n')

A solution in R which does work involves the xtabs function, e.g.:

df %>% select(x, y) %>%
group_by(x, y) %>%
count %>%
collect %>%
xtabs(formula = n ~ x + y, data=.)

Is there a solution using pivot_wider?

The following is example data (copied from the linked question) in the same format one would expect after the group_by and count operations are completed on x and y.

df = data.frame(
'x'= c('A', 'A', 'A', 'A', 'B', 'B', 'C', 'D'),
'y'= c('A', 'B', 'C', 'D', 'B', 'C', 'D', 'D'),
'n'= c(200, 150, 100, 150, 50, 100, 70, 80))

If one runs the pivot_wider(names_from = c('x', 'y'), values_from = 'n') command on this data, the output is

    # A tibble: 1 × 8
    A_A   A_B   A_C   A_D   B_B   B_C   C_D   D_D
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1   200   150   100   150    50   100    70    80

Which is not a confusion matrix nor equivalent to Python's output. When run on a sparklyr table of the exact same data, the error message is:

Error in `arrange()`:
ℹ In argument: `__row_num_79a4dd05_ce72_4873_967f_5d5ac9e9f66b`
Caused by error:
! Object `__row_num_79a4dd05_ce72_4873_967f_5d5ac9e9f66b` not found.
Run `rlang::last_error()` to see where the error occurred.
0

There are 0 best solutions below