Warning Message: pairwise_count Function

1.2k Views Asked by At

I'm attempting to follow this tutorial on using the pairwise_count function in the widyr package.

In particular, consider this line of code, where data is a tibble which includes the columns "word" and "section":

data %>% pairwise_count(word, section, sort = TRUE)

However, I received the following warning messages:

  1. distinct_() is deprecated as of dplyr 0.7.0. Please use distinct() instead.
  2. tbl_df() is deprecated as of dplyr 1.0.0. Please use tibble::as_tibble() instead.

I suspect that the pairwise_count function in the widyr package uses some outdated functions, causing these warnings. Is there a more up-to-date package or function in the tidyverse I can use as a replacement? Otherwise, is there a way to use the function without triggering these warnings?

1

There are 1 best solutions below

3
On

Code from the widyr section of Text Mining with R Chapter 4 generates deprecated function messages for usage of distinct_() and tbl_df() functions. Since there are over 100 lines of code in Chapter 4 of the book, we whittle it down to the relevant section and minimum number of packages needed to replicate the warning messages.

library(dplyr)
library(janeaustenr)
library(tidytext)
austen_section_words <- austen_books() %>%
     filter(book == "Pride & Prejudice") %>%
     mutate(section = row_number() %/% 10) %>%
     filter(section > 0) %>%
     unnest_tokens(word, text) %>%
     filter(!word %in% stop_words$word)

austen_section_words

library(widyr)

# count words co-occuring within sections
word_pairs <- austen_section_words %>%
     pairwise_count(word, section, sort = TRUE)

word_pairs 

...generates the following:

> # count words co-occuring within sections
> word_pairs <- austen_section_words %>%
+      pairwise_count(word, section, sort = TRUE)
Warning messages:
1: `distinct_()` is deprecated as of dplyr 0.7.0.
Please use `distinct()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: `tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
> 
> word_pairs
# A tibble: 796,008 x 3
   item1     item2         n
   <chr>     <chr>     <dbl>
 1 darcy     elizabeth   144
 2 elizabeth darcy       144
 3 miss      elizabeth   110
 4 elizabeth miss        110
 5 elizabeth jane        106
 6 jane      elizabeth   106
 7 miss      darcy        92
 8 darcy     miss         92
 9 elizabeth bingley      91
10 bingley   elizabeth    91
# … with 795,998 more rows

These messages are generated because widyr::pairwise_count() uses dplyr::distinct_(), which then calls tbl_df().

#' @rdname pairwise_count
#' @export
pairwise_count_ <- function(tbl, item, feature, wt = NULL, ...) {
  if (is.null(wt)) {
    func <- squarely_(function(m) m %*% t(m), sparse = TRUE, ...)
    wt <- "..value"
  } else {
    func <- squarely_(function(m) m %*% t(m > 0), sparse = TRUE, ...)
  }

  tbl %>%
    distinct_(.dots = c(item, feature), .keep_all = TRUE) %>%
    mutate(..value = 1) %>%
    func(item, feature, wt) %>%
    rename(n = value)
}

We can see the sources of the warnings when we print the warning messages with lifecycle::last_warnings().

<deprecated>
message: `tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
backtrace:
  9. widyr::pairwise_count(., word, section, sort = TRUE)
 10. widyr::pairwise_count_(...)
  3. dplyr::distinct_(., .dots = c(item, feature), .keep_all = TRUE)
  3. dplyr::mutate(., ..value = 1)
 10. widyr:::func(., item, feature, wt)
 19. widyr:::new_f(tbl, item, feature, value, ...)
  7. widyr:::custom_melt(.)
 15. dplyr::tbl_df(.)

>

Version 0.1.3 of widyr is the current version of the package. To resolve these warning messages, one must replace the reference to dplyr::distinct_() in widyr::pairwise_count(). Since this is a currently supported R package, to initiate this process one would report an Issue at the widyr Github Issues page.

As noted in the text of the warning message, distinct_() has been replaced with dplyr::distinct(), and tbl_df() has been replaced with tibble::as_tibble().

Suppressing the warnings

One can suppress the warnings produced by pairwise_count() by wrapping it within a suppressWarnings() function.

library(widyr)
suppressWarnings(
# count words co-occuring within sections
word_pairs <- austen_section_words %>%
     pairwise_count(word, section, sort = TRUE))

...and the output:

> suppressWarnings(
+ # count words co-occuring within sections
+ word_pairs <- austen_section_words %>%
+      pairwise_count(word, section, sort = TRUE))
> 
> word_pairs
# A tibble: 796,008 x 3
   item1     item2         n
   <chr>     <chr>     <dbl>
 1 darcy     elizabeth   144
 2 elizabeth darcy       144
 3 miss      elizabeth   110
 4 elizabeth miss        110
 5 elizabeth jane        106
 6 jane      elizabeth   106
 7 miss      darcy        92
 8 darcy     miss         92
 9 elizabeth bingley      91
10 bingley   elizabeth    91
# … with 795,998 more rows

Appendix

This code was run on version 4.0.2 of R, with the following packages, as reported by sessionInfo():

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidytext_0.2.5    janeaustenr_0.1.5 widyr_0.1.3       tidyr_1.1.1      
[5] dplyr_1.0.2      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       rstudioapi_0.11  magrittr_1.5     tidyselect_1.1.0
 [5] lattice_0.20-41  R6_2.4.1         rlang_0.4.7      fansi_0.4.1     
 [9] stringr_1.4.0    tools_4.0.2      grid_4.0.2       packrat_0.5.0   
[13] broom_0.7.0      utf8_1.1.4       cli_2.0.2        ellipsis_0.3.1  
[17] assertthat_0.2.1 tibble_3.0.3     lifecycle_0.2.0  crayon_1.3.4    
[21] Matrix_1.2-18    purrr_0.3.4      vctrs_0.3.2      tokenizers_0.2.1
[25] SnowballC_0.7.0  glue_1.4.1       stringi_1.4.6    compiler_4.0.2  
[29] pillar_1.4.6     generics_0.0.2   backports_1.1.8  pkgconfig_2.0.3