I am loading a large dataset that I need to filter approximately 1/20th of the rows and then group_by by 5 columns and summarize 3 remaining ones.
This page https://vroom.r-lib.org/articles/benchmarks.html says sampling, filtering, and grouped aggregation are much faster due to the lazy altrep implementation.
Since "Once a particular vector is fully materialized the speed for all subsequent operations should be identical to a normal R vector." my question is if it makes sense that it could still be advantageous to use dtplyr or data.table for the summarize operation, after filtering?