I want to read a large .txt file into R using the vroom package, because is fast and supports pipe connections for pre-filtering.
For reproducibility, let's read this UK cats csv file from the Tidy Tuesday project and pre-filter for id == "Ares". The first column corresponds to the tag_id.
The following code returns an empty dataframe. How to fix the filter and what changes are required to filter by regular expressions instead of == "Ares"?
cats_file <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-31/cats_uk.csv"
vroom(
file = pipe(paste("awk -F ',' '{ if ($1 == 'Ares') { print } }'", cats_file)),
delim = ","
)
Inside an
awkscript literal string values need to be wrapped in double quotes, eg:Single quotes are used to delimit
awkscript/code; in this caseawksees 3 chunks of script/code:{ if ($1 ==+Ares+) { print }}which
awkconcatenates into:This translates into
awkcomparing$1to whatever's in the variable namedAreswhich in this case is undefined (aka empty string) so$1 == <empty_string>fails and nothing is printed.I'm assuming you would need to escape the embedded double quotes, eg:
NOTE: I don't work with
r/vroomso I'm assuming the rest of OP's code should work once theawkscript is modified.As Ed Morton has mentioned in comments the following should also work: